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Preface 



This manual describes the technical features of the AMD-K5™ processor, and its dif- 
ferences from the Pentium processor, at a level of detail suitable for a hardware 
designer or system-software developer to implement system boards, core system 
logic, and system software. Specifically, the manual describes the following aspects 
of the processor 

■ Internal architecture 

■ Software differences from the 486 and Pentium processors 

■ Performance parameters 

■ Bus signals functions 

■ Bus cycle timing 

■ Design issues for system-board designs 

■ Test and debugging features 

A full description of the x86 programming environment is beyond the scope of this 
manual. Instead, the software sections describe differences from the 486 processor’s 
programming environment. A list of commercial books that describe the x86 pro- 
gramming environment and other subjects of potential interest appears at the end of 
this preface. 

In addition to descriptions of the AMD-K5 processor’s unique internal architecture, 
the manual incorporates details about the behavior of bus signals and bus cycles that 
are standard to the x86 processors but that are not fully documented in other x86 
manuals. 

Notation 

The following notation is used in this manual: 

b — Binary 

d — Decimal 

h — Hexadecimal 

Set — Written with a value of 1 

Clear — Written with a value of 0 

GP (0) — General-protection exception (13 decimal) with an error value of 0 
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EFLAGS.IF — The IF bit in the EFLAGS register 

CS.EIP — A logical address, expressed as a segment selector (CS) and offset (EIP) 

000F_FFF0h — A physical-memory address using hexadecimal notation 

Terminology 

The following definitions apply throughout this document: 

■ Pin and Signal — A pin is a piece of metal on the processor’s package. A signal is 
the information about logical states that a pin carries. Pins have pin numbers; sig- 
nals have signal names. On processors that multiplex signals, pins can carry more 
than one signal; the AMD-K5 processor, however, does not multiplex signals in 
this manner. 

■ Assert and Negate — A signal that is driven or sampled active is asserted. A signal 
that is inactive is negated. In general, asserted means sampled asserted either by 
the processor or target logic. Signals that are active in a Low-voltage state, such as 
BRDY, are shown with an overbar. Signals that are active in a High-voltage state, 
such as INTR, are shown without an overbar. Dual-state signals, such as R/5 and 
WB/WT, have two states of assertion and, therefore, the term asserted has no 
meaning; such dual-state signals are driven High or Low. 

■ Drive and Sample — A single-state signal is driven when it is asserted or negated by 
a logic device; it is sampled when its driven state is detected by another device. 

■ Cycle and Clock — This term commonly refers to at least four different things: 

• Bus-clock period: The cycle time of the CLK signal. 

• Processor-clock period: The cycle time of the processor’s internal clock, which 
has a frequency relative to CLK that is determined by the state of the BF sig- 
nals) during RESET. Whenever this cycle is meant, such as in the Chapter 4 
description of pipeline timing and the instruction latency, the full name, pro- 
cessor-clock cycle, is used. 

• Bus cycle: A signal protocol on the processor’s bus, such as a single-transfer 
read cycle or a special bus cycle. 

• Sequence of bus cycles: One or more contiguous bus cycles. For example, the two 
bus cycles that constitute an interrupt acknowledgment are called a bus opera- 
tion, so that the constituent bus cycles can be distinguished from the entire op- 
eration. 
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m Writeback — This term refers to two related concepts: 

• Bus Cycle — A 32-byte burst write cycle to a memory block that has been cached 
in the modified state. Writebacks can be caused by inquire cycles, internal 
snoops, writeback and invalidate operations (such as FLUSH or the WBINVD 
instruction), cache-line replacements, or locked operations on cached loca- 
tions. It is sometimes called a copyback. 

• Cache-Line State — A cache line in the modified or exclusive MESI state (modi- 
fied, exclusive, shared, invalid). 

■ Writethrough — This term refers to two related concepts: 

• Bus Cycle — A l-to-8-byte, single-transfer write cycle caused by write misses or 
write hits to lines in the shared or exclusive MESI state. 

• Cache-Line State — A cache line in the shared MESI state. 

■ Flush — This term commonly refers to at least four things and is usually avoided in 
favor of the following specific terms: 

• Pipeline Invalidation: A pipeline-flush operation invalidates instructions in the 
pipeline that have not been retired (and, depending on the type of pipeline in- 
validation, entries in the reorder buffer, entries in the TLB, and/or branch-pre- 
diction bits) without writing their state to any storage resource. 

• Cache Invalidation: The INVD instruction invalidates the contents of the in- 
struction and data caches, without writing modified data back to memory. 

• Cache Writeback and Invalidation: The WBINVD instruction writes modified 
lines in the data cache back to memory while invalidating each line in the in- 
struction and data caches. 

• FLUSH Operation: The FLUSH input signal executes the same microcode rou- 
tine as the WBINVD instruction to write modified lines in the data cache back 
to memory while invalidating each line in the instruction and data caches. 

■ Flush Acknowledge Cycle — This term commonly refers to different types of special 
bus cycles driven by the processor, and is therefore avoided in favor of the follow- 
ing specific terms: 

• FLUSH Acknowledge: A special bus cycle driven after the FLUSH operation 
completes. 

• INVD Acknowledge: A special bus cycle driven after the INVD cache invalida- 
tion completes. 

• WBINVD Acknowledge: A sequence of two special bus cycles driven after the 
WBINVD cache writeback and invalidation completes. 

■ Snoop — This term commonly refers to at least three different actions and is there- 
fore avoided in favor of the following specific terms: 

• Inquire Cycles: These are bus cycles driven by system logic. They cause the pro- 
cessor to compare the inquire-cycle address with the processor’s physical 
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cache tags. The AMD-K5 and Pentium processors both support inquire cycles. 

• Internal Snooping: These snoops are initiated by the processor (rather than sys- 
tem logic) during certain types of cache accesses. Both the AMD-K5 and Pen- 
tium microprocessors support this type of internal snooping for the purpose of 
detecting self-modifying code. See page 2-22 for details. 

• Bus Watch: Some caching devices watch their address and data bus continu- 
ously while they are held off the bus. They compare every address driven by 
another bus master with their internal cache tags, and they may also be able to 
update their cached lines during writebacks to memory by another bus master. 
Neither the AMD-K5 nor Pentium microprocessors support bus watching. 

■ Cold and Warm Reset — The terms cold or hard reset and warm or soft reset are 
commonly used to mean three related but different things, and the terms are 
therefore avoided. A cold or hard reset typically refers to the assertion of RESET 
at power-up, but warm or soft reset can refer either to the assertion of RESET 
after power-up or to the assertion of INIT. 

■ System Logic — Any logic outside the processor, including a core-logic chipset, 
another bus master, or separate controllers for L2 cache, memory, interrupts, 
DMA, communications, video, bus bridging, bus arbitration, or any other system 
function. 
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Overview 



The AMD-K5™ processor brings superscalar RISC perfor- 
mance to desktop systems running industry-standard x86 soft- 
ware. The processor implements advanced design techniques 
like instruction pre-decoding, single-cycle internal RISC opera- 
tions, parallel execution units, out-of-order issue and comple- 
tion, register renaming, data forwarding, and dynamic branch 
prediction. The processor’s many test and debug features sup- 
port fast, reliable designs for x86 desktop systems. 

AMD’s development and support of the popular Am386® and 
Am486® processors has given it a broad foundation of experi- 
ence in the x86 architecture. The AMD-K5 processor’s binary 
compatibility with DOS and Windows®-compatible software 
running on the Pentium processor and all previous x86 proces- 
sors has been established in extensive testing, using industry- 
standard test tools. Compatibility and qualification testing has 
also been provided by leading desktop-system manufacturers, 
chip-set manufacturers, and the independent XX CAL testing 
laboratory. 

The result can be seen in the AMD-K5 processor’s perfor- 
mance. This performance plus its compatibility with an 
immense library of existing x86 software make the AMD-K5 
processor a leading-edge solution for desktop systems. 
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1.1 Features 

■ Pentium-Processor Standard 

• Compatible with the Pentium (735\90, 815U00) 
processor 296-pin socket 

• Compatible with existing Pentium (735\90, 815\100) 
processor support infrastructure and system designs 

• Compatible with Pentium, 486, and 386 processor soft- 
ware 

• Compatible with x86 DOS, Microsoft® Windows® operat- 
ing system, and the large installed base of x86 software 

• Compatible with IEEE 854 floating-point standard 

• Selectable bus frequencies 

• Support for multiprocessing 

■ High-Performance Execution 

• Six execution units (two ALUs, two load/store, one 
branch, one floating-point) 

• Up to four instructions issued per processor clock 

• Out-of-order issue and completion 

• Speculative execution along three predicted branches 

• Register renaming 

• Data forwarding 

• Predecoder converts x86 instructions to single-cycle 
RISC operations (ROPs) 

• Fast integer multiply (4-cycle, fully pipelined) 

• Five-stage pipeline 

• Single-cycle cache access 

• Zero-delay branching, 3-clock misprediction penalty (of- 
ten hidden) 

• No mixed-operand-size penalty 

• No prefix penalty 

• Single-cycle misalignment penalty 

• No instruction-pairing requirements for parallel issue 

• No pipeline invalidation on segment loads 

• Efficient support for 16- and 32-bit code, with mixed op- 
erand sizes 
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■ High-Performance Cache and TLBs 

• 16-Kbyte instruction cache supports split-line access 

• 8-Kbyte, dual-ported data cache with ME SI cache coher- 
ency protocol 

• Dual-tagged (both linear and physical tags) 

• Inquire cycles run in parallel with program cache access 

• 4-Kbyte TLB (128 entries) and 4-Mbyte TLB (4 entries) 

■ Extended Features 

• Control Register 4 (CR4) 

• CMPXCHG8B instruction 

• CPUID instruction 

• Time stamp counter (TSC) 

• Machine-Specific Registers (MSRs) 

• 4-Mbyte page size 

• Global pages held in TLB during flushes 

■ Low Power 

• Static, 3.3-V design 

• System Management Mode (SMM) with I/O trapping 

• Low-power halt and stop-clock states 

• Compatible with U.S. Department of Energy’s Energy 
Star program 

• Compatible with Microsoft Advanced Power Manage- 
ment specification 

■ Extensive Test and Debug Features 

• Two built-in self-test (BIST) modes 

• Output-Float Test mode 

• Cache and TLB testing (tags and data) 

• Debug registers, with I/O breakpoint extension 

• Branch tracing 

• Functional-redundancy checking 

• IEEE 1149.1-1990 Test Access Port (TAP) and JTAG 
boundary-scan testing 

• Hardware Debug Tool (HDT) 



Features 
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Internal Architecture 



The RISC design techniques used in the processor’s internal 
architecture account, in large part, for its high performance. 
The following sections summarize the processor’s execution 
pipeline behavior, the hardware aspects of the internal instruc- 
tion cache and data cache, and the hardware aspects of mem- 
ory management. 

Figure 2-1 shows the major logic blocks that make up the inter- 
nal architecture. The blocks are organized in the figure by 
stages of the processor’s execution pipeline, which are listed 
vertically on the right side of the figure. The blocks are 
explained throughout the section that follows. 

In this chapter, the terms clock and cycle refer to processor- 
clock cycles. If bus-clock cycles or bus cycles are discussed, 
they are explicitly named. Processor-clock cycles occur at a 
multiple of bus-clock (CLK) cycles, as determined by the BF 
input signal(s) and processor model number. 
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Figure 2-1 . Internal Architecture, with Pipeline Stage 
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2.1 Prefetch and Predecode 



Figure 2-1 (top-left corner) shows the processor’s prefetch and 
predecode logic being fed with data from the external bus via 
the memory management unit. Prefetching attempts to keep 
the instruction cache and prefetch cache filled ahead of the 
execution pipeline’s fetch requirements. The processor only 
prefetches during fetch-stage misses in the instruction cache, 
which typically occur during taken branches. 

When a miss occurs, the prefetcher initiates a 32-byte burst 
memory read cycle on the bus to fill a prefetch cache. For cache- 
able accesses, the prefetch cache also fills 32-byte lines in the 
instruction cache. For non-cacheable accesses, the prefetch 
cache provides instructions directly to the execution pipeline. 

The instruction cache contains a copy of certain fields in the 
current code-segment descriptor. During a taken branch, the 
fetch logic adds the code-segment base to the effective address 
and places the resulting linear address in the prefetch program 
counter, which then increments as a linear address along a 
sequential stream. All branches during prefetching are 
assumed to be not taken. 

The processor predecodes its x86-instruction stream in the 
same clock in which x86 instructions come out of the prefetch 
cache. An x86 instruction can be from 1 to 15 bytes long. Prede- 
coding annotates each instruction byte with information that 
later enables the decode stage of the pipeline to perform more 
efficiently. The predecode information identifies whether the 
byte is the start and/or end of an x86 instruction, whether it is 
an opcode byte, and the number of internal RISC operations 
(ROPs) it will require at the decode stage. The predecode 
information is stored in the instruction cache with each x86 
instruction byte. It is passed during instruction fetching to the 
decode stage, where it allows multiple x86 instructions to be 
decoded in parallel. This avoids delaying the decode of one 
instruction until the decode of the prior instruction has deter- 
mined its ending byte. 



Prefetch and Predecode 
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2.2 Execution Pipeline 



Figure 2-1 shows the relation between the internal logic and 
the stages of the execution pipeline. Figure 2-2 shows the func- 
tions of the pipeline stages. The first five stages — Fetch, 
Decode 1, Decode 2, Execute, and Result — affect throughput 
performance. The sixth stage, Retire, may occur at a variable 
number of clocks after the Result stage, but the Retire stage 
does not affect throughput performance when the processor 
operates in a non-serialized mode, which is typical of most pro- 
cessing. Thus, the pipeline effectively has five stages. Because 
the pipeline is moderately shallow, penalties associated with 
mispredicting a branch (three clocks) or clearing the pipeline 
(variable clocks) are relatively small compared with processors 
that have deeper pipelines (more pipeline stages). 
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2.2.1 Fetch 



The processor can fetch up to 16 bytes per clock out of the 
instruction cache. Fetching begins with the calculation of the 
linear address for the next instruction along a predicted 
branch of the x86 instruction stream. The address accesses the 
instruction cache or, during a miss, the prefetch cache. Fetch- 
ing can occur along a single execution stream with up to three 
taken branches. Fetches that miss both the instruction cache 
and prefetch cache are driven to the prefetcher. 

In addition to fetching instructions, the fetch logic handles 
branch predictions and detects conditions requiring pipeline 
invalidation and restarting, such as context switches or 
branches into cache lines that do not contain the correct prede- 
code state. Branches are dynamically predicted on a cache-line 
basis using a 1-bit algorithm. Each of the 1024 instruction- 
cache lines has a tag that predicts the last byte in the cache 
line to be executed, whether or not the branch will be taken, 
and the cache index of the branch target (called the successor 
index). When the caches are invalidated, all branch predictions 
are cleared. 

During prefetch all branch instructions are predicted as not- 
taken. Later, if the execution of a branch instruction reveals a 
misprediction, the fetch unit backs out of the branch by invali- 
dating all speculative states in the prefetch cache, reorder 
buffer, load/store reservation station, and store buffer. Then, 
for cacheable instructions, the branch prediction stored in the 
instruction cache is updated while the correct branch target is 
fetched. Prediction updates are disabled when the branch 
instruction is non-cacheable, because no prediction informa- 
tion is saved for non-cacheable instructions. 

In typical x86 desktop programs, a branch occurs about once 
every seven x86 instructions. Without branch prediction, 
branch targets remain unresolved until the execution phase, 
which creates pipeline delays. The processor’s branch-predic- 
tion mechanism accurately predicts 70% to 85% of branches 
(depending on program behavior) and has a misprediction pen- 
alty of only three processor clocks. 



2-6 



Internal Architecture 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



2.2.2 Decode 



The two-stage decode logic accepts predicted x86 instruction 
bytes and their predecode bits from the fetch logic, shifts them 
into a 16-byte FIFO buffer called the byte queue , merges regis- 
ter tags and operands, and generates internal RISC operations 
(ROPs). The decode logic also generates microcode entry 
points for complex instructions, interrupts and exceptions, and 
several other functions, and it manages the floating-point 
stack. 

ROPs are fixed-format internal instructions with up to three 
operands. Most ROPs execute in a single clock. The operands 
(up to two source and one destination) can be 1-, 2-, or 4-bytes 
wide, or half of an 8- or 10-byte floating-point operand. ROPs 
can be combined to perform every function of an x86 instruc- 
tion. One x86 instruction can be decoded into as few as one 
ROP (for example, a register-to-register add), or it can be 
decoded into several ROPs, depending on its complexity. 

The processor uses a combination of hardware and microcode 
to convert x86 instructions into ROPs. The hardware consists 
of four parallel fastpath converters that translate the most 
commonly used x86 instructions (moves, shifts, branches, 
ALUs) into one, two, or three ROPs. Translations requiring 
more than three ROPs (complex instructions, serializing condi- 
tions, interrupts and exceptions, etc.) are handled by micro- 
code. Microcode generates the same types of ROPs as the 
fastpath hardware but in streams longer than three. The prede- 
code information stored with each x86 instruction byte speci- 
fies the number of ROPs that instruction requires, or it 
specifies that microcode is required. The decoder provides the 
entry point into microcode for complex operations. 

Pipeline serialization (or synchronization) is handled at the 
decode stage. When the processor decodes a serializing instruc- 
tion, it stops decoding at that instruction, waits for all previ- 
ously decoded instructions to retire (described in Section 2.2.5 
on page 2-12), then decodes and executes through retirement 
the serializing instruction before decoding any additional 
instructions. Thus, the serializing instruction is guaranteed to 
execute in program order. 



Execution Pipeline 
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The serializing instructions include OUTx, invalidations 
(INVD, WBINVD, INVLPG), interrupt returns (IRET, IRETD, 
RSM), descriptor-table-register and task-register loads (LGDT, 
LLDT, LIDT, LTR), moves to control or debug registers (MOV 
to CRx or DRx), model-specific register instructions (RDMSR, 
WRMSR), and CPUID. Special bus cycles and interrupt- 
acknowledge operations also serialize the pipeline. INx 
instructions are not executed until the store buffer and write- 
back buffers are drained of any pending writes. 

The four converters that generate fastpath or microcode ROPs 
dispatch up to four ROPs in parallel per clock to the execution 
unit reservation stations. 



2.2.3 Execute 



The processor has the following execution units that work in 
parallel with one another: 

■ Two ALUs (integer, logic, and shift operations) 

■ One floating-point unit 

■ Two load/store units 

■ One branch unit 

Each execution unit has its own FIFO reservation station with 
two or four entries. ROPs are dispatched to reservation sta- 
tions in program order. One ROP can be dispatched to a single 
reservation station in a given clock, thus up to four reservation 
stations receive an ROP each clock. ROPs are issued from a res- 
ervation station to its execution unit when all operands are 
available from the register file, reorder buffer, or prior execu- 
tion via forwarding (including from data cache loads), and 
when the execution unit has completed its prior ROP. Issue 
and dispatch occur in the same clock if the operands are avail- 
able and the unit is free at dispatch time. 

While ROPs are issued in order to a particular execution unit, 
ROPs go out of order at the point of issue because reservation 
stations issue ROPs at different times relative to each other. 
The use of reservation stations and out-of-order execution 
reduces instruction stalls due to dependencies on execution 
resources and allows a higher issue rate to be maintained. Mul- 
tiple values for the same register are resolved by providing 
tags for each register value (register renaming). True data 
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Integer/Shift Units 



dependencies are resolved using forwarding at all execution 
units. Antidependencies (in which later instructions produce a 
value that overwrites one used by an earlier instruction) are 
removed automatically by buffering operands — or tags that 
point to operands — at reservation stations. Output dependen- 
cies (in which later instructions must be seen by software to 
complete after earlier instructions in order to leave the correct 
value in a register) are resolved by the reorder buffer. 

Reservation stations are supplied with operands over eight 41- 
bit operand buses. Execution results are sent to the reorder 
buffer (ROB) over five 41-bit result buses. Tags forwarded to 
the execution units represent results to watch for on one of the 
result buses. 

No special compiler optimizations are required for high-perfor- 
mance execution on the AMD-K5 processor. 

Two ALUs perform integer, logic, and shift operations. Both 
ALUs have two-entry reservation stations. Table 2-1 shows the 
types of ROPs executed by each ALU. Unlike the Pentium pro- 
cessor, the AMD-K5 processor has few restrictions on the pair- 
ing of integer instructions needed to use both integer units in 
parallel. 



Table 2-1. ALU Instruction Classes 



Instruction Class 


ALUO 


ALU1 


Addition 


Yes 


Yes 


Subtradion 


Yes 


Yes 


Logical 


Yes 


Yes 


Compare 


Yes 


Yes 


Packed BCD 


Yes 


No 


Unpacked BCD 


Yes 


No 


Special (ADDC, SUBB) 


Yes 


No 


Shift 


No 


Yes 


Divide 


Yes 


No 



Execution Pipeline 
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Floating-Point Unit 



Load/Store Units 



Branch Unit 



The IEEE 854-compatible floating-point unit (FPU) can issue 
pipelined ROPs from its 2-entry reservation station at the rate 
of one per clock. One ROP can be issued to either the add or 
multiply pipeline in each clock, even when the operations are 
separated by an exchange ROP. The add and multiply pipe- 
lines use a common pre-detect unit and rounder. The rounder 
can return one result per clock. 

When data is loaded from memory, it is converted to an inter- 
nal 82-bit extended format before being stored in the stack. 
The format uses two of the internal 41-bit operand or result 
buses. 

Two load/store units read and write data-cache and memory 
operands. A shared, 4-entry reservation station buffers incom- 
ing ROPs, and a shared, 4-entry store buffer accepts outgoing 
speculative-state operands destined for the data cache or mem- 
ory. The reservation station is dual-ported and the store buffer 
is single-ported, so that the processor can perform two loads or 
one load and one store per clock. 

Each unit holds copies of segment-descriptor fields so that it 
can calculate logical and linear addresses and check protection 
variables and segment limits. Data loaded by one instruction in 
a load/store unit can be used by another instruction in another 
execution unit in the next clock. There is no load-use penalty. 
The data cache can be accessed in a single clock. These low 
latencies provide an important performance advantage 
because a majority of x86 instructions in typical desktop pro- 
grams involve memory as one of their operands. 

The load/store units can service two accesses in parallel (two 
loads or one load and one store), except a load and store to the 
same data-cache index and bank, or when one of the accesses is 
an I/O load, a locked access, a segment-descriptor load, a data 
breakpoint, or the first half of a misaligned access. 

The branch unit has a 2-entry reservation station and executes 
correctly predicted branches with zero delay. The unit exe- 
cutes calls, returns, conditional jumps, conditional byte-sets, 
floating-point exchanges, and microbranches. Speculative exe- 
cution occurs whenever a conditional-branch instruction exe- 
cutes. The branch unit is the only execution unit that decodes 
condition codes and supports speculative flag input operands. 
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The branch unit receives branch-prediction information from 
the decoder. If the branch unit executes a branch differently 
than predicted, it signals the instruction cache, reorder buffer, 
and decode logic, and it passes the correct information to the 
branch-prediction array in the fetch stage. 



2.2.4 Result 



The processor implements a 16-entry reorder buffer (ROB) for 
speculative-state register renaming, and a 4-entry store buffer 
for speculative-state buffering between the load/store units 
and the data cache. An ROP is said to complete when the result 
of its execution is written to the ROB or store buffer. Results 
may be returned out of order. Results written to the ROB are 
simultaneously forwarded (that is, fed back) to all execution 
units. 

An entry tag is allocated at the top of the ROB for each ROP 
that is dispatched to a reservation station. Entries for up to 
four ROPs can be allocated simultaneously. Among other 
things, the ROB keeps track of the program counter associated 
with each instruction, resolves ROP-level dependencies, stores 
speculative results, provides the most recent copy of a register 
to execution units, recovers from mispredicted branches with- 
out altering real state, and provides substitute tags to internal 
resources when required operands are still outstanding. 

The x86 architecture defines only eight general-purpose regis- 
ters and eight entries in the floating-point stack. This limited 
set of registers leads to register dependencies and register 
reuse. The processor overcomes register dependencies by 
renaming registers in the ROB, and it overcomes register reuse 
with data forwarding. Data forwarding provides execution 
results immediately to other instructions without waiting for 
results to be written to and read back from registers, the data 
cache, or memory. Multiple speculative-state registers for each 
real-state register enable different execution units to use the 
same logical register simultaneously. When the register file 
detects multiple writes to the same real-state register, only the 
latest write in program order is performed — all other writes 
are discarded. Multiple reads of the same real-state register 
are performed without detection or special handling. 



Execution Pipeline 
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2.2.5 Retire 



The processor implements a real-state (non-speculative) regis- 
ter file that contains the x86-architecture registers and a real- 
state 8-Kbyte data cache. While ROPs complete out of order 
and their results are forwarded to other execution units and to 
the ROB out of order, their results are always written at retire- 
ment time to the real-state x86 registers in program order. 
Likewise, as results are written from the load/store units to the 
store buffer out of order, they are always written at retirement 
time to the data cache and/or memory in program order. 

An x86 instruction is said to retire when the ROB or store 
buffer writes the operands for all of its ROPs, in program 
order, to the x86 real-state registers or the data cache. At the 
point of retirement, the register file and data cache fully 
reflect the execution of an instruction. Any associated excep- 
tions are recognized (the ROB facilitates precise exception 
handling), any external interrupts that were latched or are cur- 
rently held asserted are recognized, and the instruction 
pointer is updated. For instructions that store an operand to 
memory, retirement is the time at which the store is guaran- 
teed to be written externally. When a pipeline invalidation 
(flush) occurs, it does so at the retirement stage, causing all 
instructions in the pipeline that have not reached the retire- 
ment stage to be invalidated. 

The retirement stage is also called the instruction-retirement 
boundary, or simply instruction boundary. The processor can 
retire up to four instructions per processor clock. Thus, the 
next set of up to four instructions that are candidates to retire 
determines the next instruction boundary at which an external 
interrupt can be recognized. 

Only one store from the store buffer can be among the set of up 
to four instructions that retire simultaneously. If the set of 
retirement candidates in any clock includes more than one 
store, only those instructions up to (but not including) the sec- 
ond store will retire. The remaining stores occur one at a time, 
in their queued order, during subsequent retire cycles. 
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2.3 Cache Organization and Management 



The performance of the execution pipeline is enhanced by the 
processor’s on-chip, 16-Kbyte instruction cache and 8-Kbyte 
data cache. Both caches are linearly addressed and each has 
two associated tag directories, one for linear tags and one for 
physical tags. 

Linearly addressed caches avoid linear-to-physical address 
translation through the TLB and can be faster than physically 
addressed caches. Cache accesses in the AMD-K5 processor 
take one clock. The physical tags are only accessed during 
cache misses and snoops. By comparison, accesses in the Pen- 
tium processor’s physically tagged caches take one or two 
clocks, depending on the type of operand being accessed (oper- 
ands used in address calculations for the next cache access 
take two clocks). Since most x86 instructions access memory, 
they benefit greatly by being cached, and the faster cache- 
access time on the AMD-K5 processor is a performance advan- 
tage. 

The enabling and operating modes for the caches are software 
controlled by the CD and NW bits of CRO. When disabled, both 
caches are locked. They are accessed in all operating modes, 
and the processor can still hit in a cache that has not been 
invalidated, even if software has turned the caches off. These 
mechanisms work the same on both the AMD-K5 and Pentium 
processors. 

Any area of memory may be cached. However, the processor 
prevents caching of locked operations and TLB reads, the oper- 
ating system can prevent caching of certain pages by setting 
the PCD bit in page-directory and/or page-table entries, and 
system logic can prevent caching of certain bus cycles by 
negating the KEN input signal on the first BRDY. 

The processor implements a requested-word-first protocol for 
line fills in both caches. Upon receiving the first 8-byte quad- 
word, execution continues while the remainder of the line is 
loaded into the cache. Both caches, however, are blocking — a 
read hit or miss after a read miss waits until the prior miss fills 
the cache. Since read misses are rare, relative to read hits, 
cache blocking has little effect on overall performance. 
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The following sections describe the basic architecture and 
resources of the processor’s internal caches. For information 
about how the system software and hardware control cache 
configuration and coherency, see Section 6.2 on page 6-8. 



2.3.1 Instruction Cache 

The instruction cache has the following characteristics: 

■ 16 Kbytes 

■ 32-byte line size 

■ Four-way, set associative 

■ Dual-tagged (linear and physical) 

■ Single-clock access 

■ Supports 16-byte split-line accesses 

■ Requested-word-first line-fill protocol 

■ Five predecode bits per instruction byte 

■ Round-robin replacement policy 

■ Read-only, invalidate on write hit 

Instruction-cache accesses can be to any 16 bytes within a sin- 
gle 32-byte line or they can be split into two 8-byte accesses 
across two contiguous lines. 

Split-line fetches can provide instructions from sequential 
lines in a single clock. This keeps decode logic supplied with a 
steady stream of bytes. Instruction fetches can read any 16 
bytes of a single line or — in a split-line fetch — the high 8 bytes 
of the first line and the low 8 bytes of the next sequential line 
(index + 1 as determined by the A4 address bit), starting on 
either an odd or even line. 

Instruction-cache lines have only two coherency states (valid 
or invalid) rather than the four MESI (modified, exclusive, 
shared, invalid) coherency states of data-cache lines. Only two 
states are needed because these lines are only read, never writ- 
ten. In addition to holding instructions, each instruction-cache 
line holds 5 predecode bits per instruction byte. The informa- 
tion contained in these bits is described in Section 2.1 on page 
2-3. 
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Parts of the current code-segment descriptor are maintained in 
the instruction cache. This allows the cache to translate logical 
addresses for branches and other prefetch targets to linear 
address tags for the incoming cache-line fills. 

Details on the instruction-cache storage formats and testing 
are given in Section 7.4 on page 7-7. 



2.3.2 Data Cache 

The data cache has the following characteristics: 

■ 8 Kbytes 

■ 32-byte line size 

■ Four-way, set associative 

■ Four banks 

■ Dual-tagged (linear and physical) 

■ Byte-addressable 

■ Single-clock access 

■ Two true linear-tag ports — two parallel accesses per clock 

■ Two logical data ports (one read-only, one read/write) — two 
parallel accesses per clock, if not to the same bank 

■ MESI cache-coherency protocol (maintained by physical 
tags) 

■ Requested-word-first line-fill protocol 

■ Round-robin replacement policy 

■ Read/write (writeback or writethrough modes) 

The data cache overcomes load/store bottlenecks by support- 
ing simultaneous accesses to two lines in a single clock, if the 
lines are in separate banks. Each of the four cache banks con- 
tains eight bytes, or one-fourth of a 32-byte cache line. They 
are interleaved on a four-byte boundary. One instruction can 
be accessing bank 0 (bytes 0-3 and 16-19), while another 
instruction is accessing bank 1, 2, or 3 (bytes 4-7 and 20-23, 
8-11 and 24-27, and 12-15 and 28-31 respectively). 

Entries in the data cache are real-state operands. A load occurs 
when one of the load/store units reads an operand from the 
data cache or memory. A store occurs at the retirement pipe- 
line stage when an entry from the speculative-state store 
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2.3.3 



buffer, which resides between the load/store units and the data 
cache, moves to the real-state data cache or memory. 

Details on the data-cache storage formats and testing are given 
in Section 7.4 on page 7-7. 

Cache Tags 

The processor’s caches are dual-tagged. That is, the processor 
maintains two sets of tags — linear and physical — for each line 
in the two caches. The linear tags are stored in the instruction 
and data caches. The physical tags are stored in the memory 
management unit (MMU), where the TLB is also located. The 
physical-tag directories for each cache have one port. 

Linear tags are read for all accesses to the instruction and data 
caches. All read misses, memory writes, and snooping — both 
external inquire cycles and automatic internal snooping — go 
through the physical tags. The ME ST cache-coherency state is 
recorded in the physical tags. 

Accesses to the data-cache physical tags add two clocks to the 
one-clock linear-tag access. Accesses to the instruction-cache 
physical tags add three clocks to the one-clock linear-tag 
access. Thus, physical-tag accesses take a total of three clocks 
for the data cache or four clocks for the instruction cache, but 
they occur infrequently. For write hits to the data cache, how- 
ever, the additional latency for accessing the physical tags 
(needed to determine the ME ST state) is transparent to pro- 
gram execution because write hits are pipelined and can occur 
at a sustained rate of one per clock. 

There is a corresponding physical tag for each linear tag. Two 
or more linear addresses can be aliased to a single physical 
address. When the processor detects an aliased access to the 
store buffer, the TLB and physical tags forward the access 
directly from the store buffer without depending on a linear- 
tag match in the data cache. 

The linear tags for both caches are invalidated whenever pag- 
ing is turned on or off, or when CR3 (the page-directory base 
register) is loaded, except that during x86-architecture task 
switches, the linear tags are only invalidated if the current and 
new value for CR3 are different. When linear tags are invali- 
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2 . 3.4 



dated, many or all of the cached lines may still be valid, but 
accesses miss in the linear tags and go through the MMU to the 
physical tags. If an access misses the linear tags but hits in the 
physical tags, the processor restores the linear tag using the 
linear address for the access. This is called a cache-tag recovery. 
The revalidation of the linear tag does not add any additional 
time to that of the physical-tag access itself. 

The linear tags for both caches are invalidated during physical- 
tag invalidation, or when the RESET or INIT input signal is 
asserted. The linear and physical tags for both caches are inval- 
idated when the FLUSH input signal is asserted or when the 
INVD or WBINVD instruction is executed. 

Cache-Line Fills 

Memory reads that miss in the instruction or data cache gener- 
ate read-allocate operations. These begin with an attempt to 
find an invalid line in one of the four cache ways for the 
accessed index. If an invalid line cannot be found in one of the 
four ways for the index, a line is pseudo-randomly selected for 
replacement from one of the four ways. Then the processor fills 
the line by driving a four-transfer burst cycle on the bus, 
aligned on 32-byte boundaries, with the target quadword 
(qword) delivered first. 

Instruction-cache line fills initiate four 8-byte transfers from 
memory (one burst cycle) on the bus. All 32 bytes go through 
the prefetch cache (which has two 32-byte lines) to the instruc- 
tion cache and byte queue, with x86 instruction predecoding 
performed on the fly. 

Data-cache line fills also initiate four 8-byte transfers on the 
bus. If a shared or exclusive line is being replaced prior to the 
line fill, the first two 8-byte qwords fill half of the cache line, 
while the accessed data item is simultaneously forwarded 
through the load/store unit to the ROB and execution units. 
Then the remaining two qwords arrive and fill the other half of 
the cache line. When the cache line is completely filled, the 
state of the line is updated. If the line being filled is replacing 
a modified line, the prior contents of the line are copied to a 32- 
byte writeback (copyback) buffer in the bus interface unit 
while the new line is being read. 
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Cache Coherency 

The processor’s cache-coherency mechanism is based on real 
(non-speculative) state. Everything that accesses main memory 
has the same view of that memory, which is never modified 
speculatively. The contents of the processor’s data cache are 
always real-state. Furthermore, on the AMD-K5 processor, 
writes to both memory and the data cache are always done in 
program order, irrespective of the state of the EWBE input sig- 
nal. 

The processor’s data cache implements coherency with the 
MESI (Modified, Exclusive, Shared, Invalid) protocol. The 
instruction cache, which is read-only, has no write-related 
states. The instruction cache implements coherency with only 
a valid bit, which in effect works like a shared-invalid subset of 
the MESI protocol. The coherency state bits are stored in the 
physical tags for each cache. The physical tags can be accessed 
by external logic (using inquire cycles) or the processor (for 
internal snoops) in parallel with accesses to the linear tag by 
programs running on the processor. 

Table 2-2 shows all possible cache-line states before and after 
program-generated accesses to individual cache lines. The 
table includes the correspondence between MESI states and 
writethrough or writeback states for lines in the data cache. 
Table 2-3 shows all possible cache-line states before and after 
cache snoop or invalidation operations performed with inquire 
cycles. Together, these tables show all of the conditions for 
writethroughs and writebacks to memory. 
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Table 2-2. Cache States for Read and Write Accesses 



Type 


Tags 1 


Cache State 
Before Access 5 


Access 

Type 2 


Cache State After Access 


MESI State 5 


Writeback- 

Writethrough 

State 


Cache Read 


Read Miss 


Linear 


invalid 


single read 


invalid 


invalid 


invalid 3 (cache- 
able) 


burst read (line 
fill) 


shared or 
exclusive 4 


writethrough or 
writeback 4 


Read 

Hit 


Linear 


shared 


- 


shared 


writethrough 


exclusive 


- 


exclusive 


writeback 


modified 


- 


modified 


writeback 


Cache 

Write 


Write Miss 


Linear 


invalid 


single write 


invalid 


invalid 


Write Hit 


Linear 


shared 


cache update 
and single write 


shared or 
exclusive 4 


writethrough or 
writeback 4 


exclusive or 
modified 


cache update 


modified 


writeback 



Notes: 



1. Linear tags are masked by A20M, physical tags are not. 

2. Single read, single write, cache update, and writethrough = 1 to 8 bytes. Line fill = 52 bytes. 

5. If CACHt and KEN are Low. 

4. If PWT is Low and I WB/WT is High. 

5. MESI state is stored in the physical tags. Instruction-cache state consists only of valid (shared) or invalid, and there are no write- 
related states. 

- Not applicable or none. 
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Table 2-3. Cache States for Snoops, Invalidation, and Replacements 



Type of 


Tags’ 


Cache State 


Memory Access 3 


Cache State After Operation 


Operation 




Before 

Operation 3 




MESI State 5 


Writeback- 
Writethrough State 






shared or exclu- 




INV=0 


shared 


writethrough 


Inquire 


Physical 


sive 




INV=1 


invalid 


invalid 


Cycle 


modified 


burst write (write- 


INV=0 


shared 


writethrough 






back) 


INV=1 


invalid 


invalid 


Internal 


Physical 


shared or exclu- 
sive 


- 


ini/aliH 


invalid 


Snoop 


modified 


burst write (write- 
back) 


1 1 1 V 


UIIU 


FLUSH 


Physical 


shared or exclu- 
sive 


- 


inwalirl 


invalid 


Signal 


modified 


burst write (write- 
back) 


II 1 V 


UIIU 


WBINVD 


Physical 


shared or exclu- 
sive 


- 


ini/aliH 


invalid 


Instruction 


modified 


burst write (write- 
back) 


1 1 1 V 


UIIU 


INVD 

Instruction 


- 


- 


- 


invalid 


invalid 


Cache-Line 


Physical 


shared or exclu- 
sive 


- 


Depends on 
replacement-line 
characteristics 


Replacement 


modified 


burst write (write- 
back) 


Notes: 

1. Linear tags are masked by A20M, physical tags are not. 

2. Writeback = 32 bytes. 

3. MESI state is stored in the physical tags. Instruction-cache state consists only of valid (shared) or invalid, and there are no write- 
related states. 

- Not applicable or none. 
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2.3.6 Snooping 

The term snooping commonly refers to at least three different 
actions, only two of which are supported by the AMD-K5 and 
Pentium processors: 

■ Inquire Cycles — These are bus cycles initiated by external 
logic that cause the processor to look up an address in its 
physical cache tags. Both the AMD-K5 and Pentium proces- 
sors support inquire cycles. 

■ Internal Snooping — This is initiated by the processor (rather 
than external logic) during certain cache accesses. Internal 
snooping detects self -modifying code. Both the AMD-K5 and 
Pentium processors support internal snooping. 

■ Bus Watching — Some caching devices watch their address 
and data buses while they are held off the bus, comparing 
addresses driven by another bus master with their internal 
cache tags and optionally updating their cached lines on the 
fly during writebacks by the other master. The AMD-K5 and 
the Pentium processor do not support bus watching. 

Table 2-4 shows the conditions under which snooping occurs in 
the AMD-K5 processor and the resources that are snooped. All 
such snooping is done in the processor’s physical tags, in paral- 
lel with the processor’s own accesses to the linear tags. Thus, 
there is no execution-performance penalty for snooping. 

Inquire Cycles In systems with multiple caching masters, external logic main- 

tains cache coherency by driving inquire cycles to the proces- 
sor. System logic initiates inquire cycles by asserting AHOLD, 
BUFF, or HOLD to obtain control of the address bus, and then 
driving EADS, INV and an inquire address. Such bus cycles 
cause the processor to compare the physical tags for both its 
instruction and data caches with the inquire address. If the 
compare hits a shared or exclusive line in the data cache or a 
valid line in the instruction cache, the processor asserts HIT. If 
the compare hits a modified line in the data cache, the proces- 
sor asserts HITM. 

The resulting state of a cache line that is hit depends on the 
state of the INV signal at the time of the inquire cycle. If INV is 
negated, the line remains in or transitions to the shared (or 
valid) state. If INV is asserted, the modified line in the data 
cache is written back, and the line is invalidated. 
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Table 2-4. Snoop Action 



Origin of 
Snoop 


Type of Access 


Snooping Action 


Instructions 


Data 


Instruction 

Cache 


Prefetch 

Cache 


Data 

Cache 


Store 

Buffer 


Writeback 

Buffers 


External 


Inquire Cycle 


yes 1 


yes 


yes 1 


no 


yes 1 


Internal 


Instruction 

Cache 


Read 

Miss 


- 


- 


yes 2 


yes 2 


yes 2 


Read 

Hit 


- 


- 


no 


no 


no 


Data 

Cache 


Read 

Miss 


yes 3 


yes 3 


- 


- 


- 


Read 

Hit 


no 


no 


- 


- 


- 


Write 

Miss 


yes 4 


yes 4 


- 


- 


- 


Write 

Hit 


no 


no 


- 


- 


- 



Notes: 



1. The processor's response to a snoop hit depends on the state of the I NV input signal and the state of the cache line as follows: 
For instructions if I NV is negated, the line remains invalid or shared, but if I NV is asserted, the line is invalidated. For data if INV is 
negated, vali d lines remain in or transition to the shared state, a modified data cache line is written back before the line is marked 
shared (with HUM asserted), invalid lines remain invalid. 

For data if INV is asserted, the line is marked invalid. Modified lines are written back before invalidation. 

2. If the snoop hits a line in the data cache, store buffer or writeback buffer, the line is written back (if modified) and invalidated. 
Then the instruction-cache read is performed again. If the line is modified, a copy of the writeback data is passed directly to the 
instruction cache, thus avoiding a line-fill bus cycle after the writeback bus cycle. 

3. If the snoop hits a line in the instruction cache, prefetch cache, or line-fill buffer, the line stays valid and the data -cache read is 
performed again, but as a single, non-cacheadle read. 

4. If the snoop hits a line in the instruction cache, prefetch cache, or line-fill buffer, the line is invalidated and the data-cache write is 
performed. 

- Not applicable. 



Internal Snooping The processor automatically snoops its instruction cache dur- 
ing read or write misses to its data cache, and it snoops its data 
cache during read misses to its instruction cache. It does this to 
detect the presence of self-modifying code. Table 2-4 summa- 
rizes the actions taken during this internal snooping. 
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If an internal snoop hits its target, the processor does the fol- 
lowing: 

■ During Instruction-Cache Read Miss — The line in the data 
cache, store buffer, or writeback buffer is written back (if 
modified) and invalidated, and the instruction-cache read is 
performed again. If the data-cache line was modified, a 
copy of the writeback data is passed directly to the instruc- 
tion cache, thus avoiding a line-fill bus cycle after the write- 
back bus cycle. 

■ During Data-Cache Read Miss — The line in the instruction 
cache, prefetch cache, or line-fill buffer stays valid, and the 
data-cache read is performed as a single, non-cacheable 
read. 

■ During Data-Cache Write Miss — The line in the instruction 
cache, prefetch cache, or line-fill buffer is invalidated, the 
reorder buffer invalidates all instructions in the pipeline 
following the instruction that initiated the snoop, and the 
data-cache write is performed. 

The AMD-K5 processor, like the 486 processor but unlike the 
Pentium processor, requires a jump (near or far) after a self- 
modifying write to clear the prefetch cache. However, both the 
AMD-K5 and the Pentium processors require a serializing 
instruction after self-modifying code whose physical address is 
aliased to multiple linear addresses. 



2.3.7 Buffers 



Several buffers are associated with the instruction and data 
caches, as described below. 

Line-Fill Buffers The processor has two 16-byte line-fill buffers in the bus inter- 

face unit, one of which is used during instruction-cache line 
fills and the other during data-cache line fills. The buffer holds 
half of the 32-byte burst cycle that the processor drives in 
response to a cacheable fetch miss. 

Instruction-cache lines are 16 bytes wide. During fetch misses, 
the first 16 bytes of the burst go through the prefetch cache to 
the instruction cache and/or byte queue. The remaining 16 
bytes from the 32-byte burst cycle, if they are not used immedi- 
ately thereafter to fill the prefetch cache, are held in a 16-byte 
line-fill buffer in the bus interface unit for a possible future 
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Prefetch Cache 



Store Buffer 



access. As shown in Table 2-4, the line-fill buffer for the 
instruction cache is snooped internally during read or write 
misses in the data cache, but it is not snooped during inquire 
cycles. The line-fill buffer for the data cache, unlike the 
instruction-cache buffer, is never snooped and for this reason 
does not appear in Table 2-4. 

The processor prefetches instructions during fetch-stage 
misses in the instruction cache, as described in Section 2.1 on 
page 2-3. When a miss occurs, the processor initiates a 32-byte 
access for a 16-byte line fill and additional sequentially 
addressed bytes to fill the prefetch cache. During non-cache- 
able accesses, the fetch logic fetches directly from the prefetch 
cache. 

As shown in Table 2-4 on page 2-22, the prefetch cache is 
snooped internally during read or write misses in the data 
cache and during inquire cycles. 

The Pentium processor implements a write buffer in which 
real-state data writes can be buffered, waiting for access to the 
bus, and in which certain types of cacheable read cycles on the 
bus are promoted ahead of certain types of write cycles when 
the EWBE signal is asserted. The AMD-K5 processor has no 
such real-state write buffer between its data cache and the bus, 
although it does implement a speculative-state, 4-entry, 4-byte- 
wide store buffer between the two load/store execution units 
and the data cache. 

The store buffer can contain both speculative- and real-state 
data. Each entry in the store buffer is in speculative state until 
the associated ROP is retired, after which the data is trans- 
ferred to the data cache and/or memory, both of which repre- 
sent the real (non-speculative) state of data. A store occurs at 
the retirement stage of the pipeline, when the processor writes 
an entry from the store buffer to the data cache and/or mem- 
ory. For non-cacheable stores, the processor writes directly 
from the store buffer to the bus interface, at which point the 
store becomes real-state. 

As shown in Table 2-4 on page 2-22, the store buffer is not 
snooped during inquire cycles. When external logic drives an 
inquire cycle, the processor’s response depends only on the 
contents of the data cache at that time (that is, only on its real 
state). Subsequent stores to that line — be they in the store 
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buffer, load/store execution units, reservation stations, decode 
unit, or prefetch cache — are not relevant to an inquire cycle or 
internal snoop. Such stores are speculative and might never 
occur, due to a branch misprediction, an interrupt, or other 
intervening event. 

As a buffered store leaves the store buffer to update the data 
cache and/or memory, the processor checks the location’s 
MESI state in the physical tags and observes the MESI update 
rules for that state. For example, if a buffered store were going 
to hit an exclusive line in the data cache when first placed in 
the store buffer, but the line’s MESI state was changed from 
exclusive to shared by a subsequent inquire cycle while the 
store waited in the store buffer, the store would see a shared 
state on being transferred to the data cache, and it would 
become a writethrough, going externally to main memory at the 
same time that it updates the data cache. 

The processor has a 1-entry, 32-byte-wide writeback (copy- 
back) buffer in the bus interface unit for replacements and 
invalidations. The buffer is used for writebacks of modified 
data in the data cache due to one of the following: 

■ Cache-line replacement during data-cache read miss 

■ WBINVD instruction 

■ FLUSH signal 

During cache-line replacements, the memory read cycle for the 
new cache line is initiated on the bus before the contents of the 
modified line to be replaced are copied into the writeback 
buffer. When the cache-line fill is completed, the contents of 
the writeback buffer are written to memory. 

Writethroughs from the data cache do not go through a buffer. 
These transfers are between 1 and 8 bytes in length and they 
go directly onto the bus from the store buffer. 

As shown in Table 2-4 on page 2-22, the writeback buffer is 
snooped internally during instruction-cache read misses and 
during inquire cycles. 
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In addition to the replacement and invalidation writeback 
buffer, the processor also has a 1-entry, 32-byte-wide snoop 
writeback buffer in the bus interface unit that is used for 
writebacks due to one of the following: 

■ Internal snoop during an instruction-cache read miss 

■ External inquire cycle in which the INV signal is asserted 

A modified data-cache line can be replaced in parallel with a 
snoop-hit invalidation to a modified line because the write- 
backs go to separate buffers. 

2.4 Memory Management Unit (MMU) 



Snoop Writeback 
Buffer 



The MMU supports standard x86 demand-paged virtual memo- 
ry by translating linear addresses to physical addresses. To 
speed this process, the most recently accessed address transla- 
tions are stored in one of two translation lookaside buffers 
(TLBs), one for mapping 4-Kbyte pages and another for map- 
ping 4-Mbyte pages. Mappings to 4-Kbyte and 4-Mbyte pages 
can be intermixed in a given page directory, the base of which 
is pointed to by the contents of control register 3 (CR3). 

During memory accesses, the MMU receives a linear address 
and searches the TLBs for a corresponding physical address. If 
found, the physical address is passed to the physical tag direc- 
tory for a validity check. If the physical address is not present 
(a TLB miss), the MMU searches the page directory and page 
tables in memory. If found, the MMU loads the translation into 
the appropriate TLB. If not found, the processor generates a 
page fault. 

2.4.1 Storage Model 

The AMD-K5 processor always observes the strongly ordered 
memory-write model. All writes — whether to cache, memory, or 
I/O — are performed in program order, regardless of the state 
of the External Write Buffer Empty (EWBE) input signal. The 
only effect of EWBE on writes is to hold additional writes off 
when the signal is negated. In particular, assertion of EWBE 
does not permit the AMD-K5 processor to observe a weakly 
ordered memory-write model, in which writes to cache may 
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occur out of program order with respect to writes on the bus to 
memory. 

In a strongly ordered memory-write model, writes to cache and 
memory always appear in program order. In a weakly ordered 
memory-write model, writes to cache and memory can occur 
out of program order (that is, a write to cache can occur before 
a prior write to memory). Weakly ordered systems may per- 
form better, but they can cause problems in systems with mul- 
tiple-caching masters. For example, errors may occur in weakly 
ordered systems when a master that is held off the bus contin- 
ues writing to exclusive or modified lines in its internal data 
cache while another master writes to memory. Nevertheless, 
the strongly ordered AMD-K5 processor supports high perfor- 
mance without using weakly ordered memory writes by buffer- 
ing speculative stores in the store buffer. 

2.4.2 Read/Write Reordering 

The processor reorders certain types of cacheable read cycles 
on the bus ahead of certain types of write cycles. Specifically, 
any read that hits in the instruction or data cache is promoted 
ahead of a write in the store buffer if the read is not from the 
same location to which a write in the store buffer is to be writ- 
ten. The reordering allows reads, which dominate the proces- 
sor’s use of the bus in Writeback mode, to take precedence 
over data writes, which normally occur infrequently. The 
EWBE signal has no effect on this reordering of bus cycles. 

2.4.3 Segmentation 

The instruction cache contains a copy of certain fields in the 
current code-segment descriptor. The information is used dur- 
ing prefetch for segment translation (logical-to-linear 
addresses), thus providing linear-address tags for the instruc- 
tion-cache entries. Likewise, the load/store units hold the cur- 
rent data-segment descriptors, which are used to generate the 
linear address and perform protection checks during data- 
cache accesses. The processor can cache segment descriptors 
in its data cache. 
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2.4.4 



Paging and the TLBs 

The processor supports 4-Kbyte and 4-Mbyte paging with two 
separate translation lookaside buffers (TLBs) that work in par- 
allel: 

■ 4-Kbyte Pages — A 128-entry, four-way, set-associative TLB 
that can cover 512 Kbytes of memory space 

■ 4-Mbyte Pages — A four-entry, fully-associative TLB that can 
cover 16 Mbytes of memory space 

The TLBs are accessed during cache accesses that miss in the 
linear tags. Each TLB is organized into tag directories (linear- 
address references) and data arrays (physical-address refer- 
ences). The TLB entries also contain bits used to check privi- 
lege and access rights. Because the caches are linearly 
addressed, however, cache accesses do not go through the TLB. 
The cache accesses are faster because the TLB is not involved. 
Copies of the privilege and access bits from the TLB entries 
are loaded into the caches when the cache lines are filled. If a 
privilege-level violation is detected during a cache access, the 
TLB is accessed, and it alone can issue a page-related excep- 
tion. 

TLB invalidations (flushes) are done in the standard ways: a 
MOV to CR3, which loads a new page-table directory, or the 
INVLPG instruction, which invalidates a single TLB entry. 

Both the 4-Kbyte and 4-Mbyte TLBs support global pages, 
which remain in the TLBs during such TLB invalidations when 
the global-page extension is enabled. 

When a TLB miss or fault occurs during a prefetch, bits reflect- 
ing this are passed via the prefetch cache to the decode logic 
during fetch misses so that microcode can serialize the pipe- 
line and initiate the TLB reload nonspeculatively. TLB replace- 
ment is done using a pseudo-random algorithm. The processor 
never caches TLB loads, regardless of the state of the PCD and 
PWT bits, and it does not do speculative TLB reloads. A page- 
fault handler, however, may cache page-table entries in the 
data cache. During a TLB reload, the physical cache tags are 
snooped for the page-table entry (PTE). A hit on a modified line 
causes that line to be written back to memory. The TLB then 
completes the read from memory. The TLB always performs 
reloads from memory, regardless of whether a page-directory 
entry (PDE) or page-table entry (PTE) is in the data cache. If 
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the TLB reload involves a write to memory to set the PDE 
Accessed or Dirty bit, a hit during the physical-tag snoop 
causes the cache line to be invalidated. 

Details on software configuration for 4-Mbyte paging are given 
in Section 3.1.2 on page 3-5. The global-page option is 
described in Section 3.1.3 on page 3-9. Details on the TLB stor- 
age formats and their testing are given in Section 7.4 on page 
7-7. 
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3 

Software Environment and 
Extensions 



The AMD-K5 processor is compatible with the instruction set, 
programming model, memory management mechanisms, and 
other software infrastructure supported by the 486 and Pen- 
tium (735\90, 815U00) processors. Operating system and appli- 
cation software that runs on the Pentium processor can be 
executed on the AMD-K5 processor without modification. 
Because the AMD-K5 processor takes a significantly different 
approach to implementing the x86 architecture, some subtle 
differences from the Pentium processor may be visible to sys- 
tem and code developers. These differences are described in 
Appendix A. 

The AMD-K5 processor implements the following extensions to 
the 486 architecture: 

■ 4-Mbyte Page Size 

■ Global Pages 

■ Protected Virtual Extensions 

■ Virtual-8086 Mode Extensions (VME) 

■ Machine-Check Registers and Exceptions 

■ Model-Specific Registers (MSRs) 

■ Time Stamp Counter (TSC) 

■ New Instructions: CPUID, CMPXCHG8B, MOV to and from 
CR4, RDTSC, RDMSR, WRMSR, and RSM 

■ I/O Breakpoints 
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The sections that follow provide details on the architectural 
extensions visible to system and application software. Some 
sections include pseudo-code algorithms for suggested BIOS 
modifications to support the extensions. Architectural exten- 
sions visible to debug and test software, such as I/O break- 
points, are described in Chapter 7. 

3.1 Control Register 4 (CR4) Extensions 

Control register 4 contains bits that enable or specify many of 
the extensions to the 486 architecture. The majority of the bits 
in CR4 are reserved. The default state for all bits in CR4 is all 
zeros. Figure 3-1 shows the format of CR4. Table 3-1 describes 
the fields in CR4. 



31 876543210 




Figure 3-1 . Control Register 4 (CR4) 
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Table 3-1. Control Register 4 (CR4) Fields 



Bit 


Mnemonic 


Description 


Function 


7 


GPE 


Global Page 
Extension 


Enables retention of designated entries in the 4-Kbyte TLB or 
4-Mbyte TLB during invalidations. 

1 = enabled, 0 = disabled. 

See Section 3.1.3 on page 3-9 for details. 


6 


MCE 


Machine-Check Enable 


Enables machine-check exceptions. 

1 = enabled, 0 = disabled. 

See Section 3.1.1 on page 3-4 for details. 


4 


PSE 


Page Size 
Extension 


Enables 4-Mbyte pages. 

1 = enabled, 0 = disabled. 

See Section 3.1.2 on page 3-5 for details. 


3 


DE 


Debugging 

Extensions 


Enables I/O breakpoints in the DR7-DR0 registers. 
1 = enabled, 0 = disabled. 

See Section 7.5 on page 7-16 for details. 


2 


TSD 


Time Stamp 
Disable 


Selects privileged (CPL=0) or non-privileged (CPL>0) use of 
the RDTSC instruction, which reads the Time Stamp Counter 
(TSC). 

1 = CPL must be 0, 0 =any CPL. 

See Section 3.2.3 on page 3-27 for details. 


1 


PVI 


Protected Virtual 
Interrupts 


Enables hardware support for interrupt virtualization in Pro- 
tected mode. 

1 = enabled, 0 = disabled. 

See Section 3. 1 .5 on page 3-24 for details. 


0 


VME 


Virtual-8086 
Mode Extensions 


Enables hardware support for interrupt virtualization in Vir- 
tual-8086 mode. 

1 = enabled, 0 = disabled. 

See Section 3.1.4 on page 3-12 for details. 
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3.1.1 Machine-Check Exceptions 

Bit 6 in CR4, the machine-check enable (MCE) bit, controls 
generation of machine-check exceptions (12h). If enabled by 
the MCE bit, these exceptions are generated when either of 
the following occurs: 

■ System logic asserts BUSCHK to identify a parity or other 
type of bus-cycle error 

■ The processor asserts PCHK while system logic asserts PEN 
to identify an enabled parity error on the D63-D0 data bus 

Whether or not machine-check exceptions are enabled, the 
processor does the following when either type of bus error 
occurs: 

■ Latches the physical address of the failed cycle in its 64-bit 
machine-check address register (MCAR) 

■ Latches the cycle definition of the failed cycle in its 64-bit 
machine-check type register (MCTR) 

Software can read the MCAR and MCTR registers in the excep- 
tion handling routine with the RDMSR instruction, as 
described in Section 3.3.5 on page 3-33. The format of the regis- 
ters is shown in Figure 3-8 on page 3-25 and Figure 3-9 on page 
3-26. 

If system software has cleared the MCE bit in CR4 to 0 before 
a bus-cycle error, the processor attempts to continue execution 
without generating a machine-check exception, although it still 
latches the address and cycle type in MCAR and MCTR as 
described above. 
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3.1.2 4-Mbyte Pages 

The TLBs in the 486 and 386 processors support only 4-Kbyte 
pages. However, large data structures such as a video frame 
buffer or non-paged operating system code can consume many 
pages and easily overrun the TLB. The AMD-K5 processor 
accommodates large data structures by allowing the operating 
system to specify 4-Mbyte pages as well as 4-Kbyte pages, and 
by implementing a four-entry, fully-associative 4-Mbyte TLB 
which is separate from the 128-entry, 4-Kbyte TLB. From a 
given page directory, the processor can access both 4-Kbyte 
pages and 4-Mbyte pages, and the page sizes can be intermixed 
within a page directory. When the Page Size Extension (PSE) 
bit in CR4 is set, the processor translates linear addresses 
using either the 4-Kbyte TLB or the 4-Mbyte TLB, depending 
on the state of the page size (PS) bit in the page-directory 
entry. Figures 3-2 and 3-3 show how 4-Kbyte and 4-Mbyte page 
translation work. 



4- Kbyte 
Page 
Directory 



4- Kbyte 4- Kbyte 

Page Page 

Table 




Linear Address 



Figure 3-2. 4-Kbyte Paging Mechanism 
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4-Mbyte 

Page 




Linear Address 



Figure 3-3. 4-Mbyte Paging Mechanism 



To enable the 4-Mbyte paging option: 

1. Set the Page Size Extension (PSE) bit in CR4 to 1. 

2. Set the Page Size (PS) bit in the page-directory entry to 1. 

3. Write the physical base addresses of 4-Mbyte pages in bits 
31-22 of page-directory entries. (Bits 21-12 of these entries 
must be cleared to 0 or the processor will generate a page 
fault.) 

4. Load CR3 with the base address of the page directory that 
contains these page-directory entries. 
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Figure 3-1 and Table 3-1 show the fields in CR4. Figure 3-4 and 
Table 3-2 show the fields in a page-directory entry. 

4-Kbyte page translation differs from 4-Mbyte page translation 
in the following ways: 

■ 4-Kbyte Paging ( Figure 3-2 ) — Bits 31-22 of the linear address 
select an entry in a 4-Kbyte page directory in memory, 
whose physical base address is stored in CR3. Bits 21-12 of 
the linear address select an entry in a 4-Kbyte page table in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 11-0 of the linear 
address select a byte in a 4-Kbyte page, whose physical base 
address is specified by the page-table entry. 

■ 4-Mbyte Paging ( Figure 3-3 ) — Bits 31-22 of the linear 
address select an entry in a 4-Mbyte page directory in mem- 
ory, whose physical base address is stored in CR3. Bits 21-0 
of the linear address select a byte in a 4-Mbyte page in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 21-12 of the page- 
directory entry must be cleared to 0. 
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Table 3-2. Page-Directory Entry (PDE) Fields 



Bit 


Mnemonic 


Description 


Function 


31-12 


BASE 


Physical Base 
Address 


For 4-Kbyte pages, bits 31-12 contain the physical base address of 
a 4-Kbyte page table. 

For 4-Mbyte pages, bits 31-22 contain the physical base address 
of a 4-Mbyte page and bits 21-12 must be cleared to 0. (The pro- 
cessor will generate a page fault if bits 21 -1 2 are not cleared to 0.) 


11-9 


AVL 


Available to Software 


Software may use this field to store any type of information. When 
the page-directory entry is not present (P bit cleared), bits 31-1 
become available to software. 


8 


G 


Global 


0 = local, 1 = global. 


7 


PS 


Page Size 


0 = 4-Kbyte, 1 = 4-Mbyte. 


6 


D 


Dirty 


For 4-Kbyte pages, this bit is undefined and ignored. The proces- 
sor does not change it. 

0 = not written, 1 = written. 

For 4-Mbyte pages, the processor sets this bit to 1 during a write 
to the page that is mapped by this page-directory entry. 

0 = not written, 1 = written. 


5 


A 


Accessed 


The processor sets this bit to 1 during a read or write to any page 
that is mapped by this page-directory entry. 

0 = not read or written, 1 = read or written. 


4 


PCD 


Page Cache Disable 


Specifies cacheability for all pages mapped by this page-directory 
entry. Whether a location in a mapped page is actually cached 
also depends on several other factors. 

0 = cacheable page, 1 = non-cacheable. 


3 


PWT 


Page Writethrough 


Specifies writeback or writethrough cache protocol for all pages 
mapped by this page-directory entry. Whether a location in a 
mapped page is actually cached in a writeback or writethrough 
state also depends on several other factors. 

0 = writeback page, 1 = writethough page. 


2 


U/S 


User/Supervisor 


0 = user (any CPL), 1 = supervisor (CPL < 3). 


1 


W/R 


Write/Read 


0 = read or execute, 1 = write, read, or execute. 


0 


P 


Present 


0 = not valid, 1 = valid. 
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3 . 1.3 



Global Pages 

The processor’s performance can sometimes be improved by 
making some pages global to all tasks and procedures. This can 
be done for both 4-Kbyte pages and 4-Mbyte pages. 

The processor invalidates (flushes) both the 4-Kbyte TLB and 
the 4-Mbyte TLB whenever CR3 is loaded with the base 
address of the new task’s page directory. The processor loads 
CR3 automatically during task switches, and the operating sys- 
tem can load CR3 at any other time. Unnecessary invalidation 
of certain TLB entries can be avoided by specifying those 
entries as global (a global TLB entry references a global page). 
This improves performance after TLB flushes. Global entries 
remain in the TLB and need not be reloaded. For example, 
entries may reference operating system code and data pages 
that are always required. The processor operates faster if these 
entries are retained across task switches and procedure calls. 

To specify individual pages as global: 

1. Set the Global Page Extension (GPE) bit in CR4. 

2. (Optional) Set the Page Size Extension (PSE) bit in CR4. 

3. Set the relevant Global (G) bit for that page: 

For 4-Kbyte pages — Set the G bit in both the page-directory 
entry (shown in Figure 3-4 and Table 3-2) and the page- 
table entry (shown in Figure 3-5 and Table 3-3). 

For 4-Mbyte pages — (Optional) After the PSE bit in CR4 is 
set, set the G bit in the page-directory entry (shown in Fig- 
ure 3-4 and Table 3-2). 

4. Load CR3 with the base address of the page directory. 

The INVLPG instruction clears both the V and G bits for the 
referenced entry. To invalidate all entries, including global- 
page entries, in both TLBs: 

1. Clear the Global Page Extension (GPE) bit in CR4. 

2. Load CR3 with the base address of another (or same) page 
directory. 
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Table 3-3. Page-Table Entry (PTE) Fields 



Bit 


Mnemonic 


Description 


Function 


31-12 


BASE 


Physical Base 
Address 


The physical base address of a 4-Kbyte page. 


11-9 


AVL 


Available to Soft- 
ware 


Software may use the field to store any type of information. 
When the page-table entry is not present (P bit cleared), bits 31-1 
become available to software. 


8 


G 


Global 


0 = local, 1 = global. 


7 


PS 


Page Size 


This bit is ignored in page-table entries, although clearing it to 0 
preserves consistent usage of this bit between page-table and 
page-directory entries. 


6 


D 


Dirty 


The processor sets this bit to 1 during a write to the page that is 
mapped by this page-table entry. 

0 = not written, 1 = written. 


5 


A 


Accessed 


The processor sets this bit to 1 during a read or write to any page 
that is mapped by this page-table entry. 

0 = not read or written, 1 = read or written. 


4 


PCD 


Page Cache Disable 


Specifies cacheability for all locations in the page mapped by this 
page-table entry. Whether a location is actually cached also 
depends on several other factors. 

0 = cacheable page, 1 = non-cacheable. 


3 


PWT 


Page Writethrough 


Specifies writeback or writethrough cache protocol for all loca- 
tions in the page mapped by this page-table entry. Whether a 
location is actually cached in a writeback or writethrough state 
also depends on several other factors. 

0 = writeback, 1 =writethough. 


2 


U/S 


User/Supervisor 


0 = user (any CPL), 1 = supervisor (CPL < 3). 


1 


W/R 


Write/Read 


0 = read or execute, 1 = write, read, or execute. 


0 


P 


Present 


0 = not valid, 1 = valid. 



Control Register 4 (CR4) Extensions 



3-11 






AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



3.1.4 Virtual-8086 Mode Extensions (VME) 

The Virtual-8086 Mode Extensions (VME) bit in CR4 (bit 0) 
enables performance enhancements for 8086 programs run- 
ning as protected tasks in Virtual-8086 mode. These extensions 
include: 

■ Virtualizing maskable external interrupt control and notifi- 
cation via the VIF and VIP bits in EFLAGS 

■ Selectively intercepting software interrupts (INTn instruc- 
tions) via the Interrupt Redirection Bitmap (IRB) in the 
Task State Segment (TSS) 

8086 programs expect to have full access to the interrupt flag 
(IF) in the EFFAGS register, which enables maskable external 
interrupts via the INTR signal. When 8086 programs run in Vir- 
tual-8086 mode on a 386 or 486 processor, they run as pro- 
tected tasks and access to the IF flag must be controlled by the 
operating system on a task-by-task basis to prevent corruption 
of system resources. 

Without the VME extensions available on the AMD-K5 proces- 
sor, the operating system controls Virtual-8086 mode access to 
the IF flag by trapping instructions that can read or write this 
flag. These instructions include STI, CFI, PUSHF, POPF, INTn, 
and IRET. This method prevents changes to the real IF when 
the I/O privilege level (IOPF) in EFFAGS is less than 3, the 
privilege level at which all Virtual-8086 tasks run. The operat- 
ing system maintains an image of the IF flag for each Virtual- 
8086 program by emulating the instructions that read or write 
IF. When an external maskable interrupt occurs, the operating 
system checks the state of the IF image for the current Virtual- 
8086 program to determine whether the program is allowing 
interrupts. If the program has disabled interrupts, the operat- 
ing system saves the interrupt information until the program 
attempts to re-enable interrupts. 

The overhead for trapping and emulating the instructions that 
enable and disable interrupts, and the maintenance of virtual 
interrupt flags for each Virtual-8086 program, can degrade the 
processor’s performance. This performance can be regained by 
running Virtual-8086 programs with IOPF set to 3, thus allow- 
ing changes to the real IF flag from any privilege level, but 
with a loss in protection. 
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Hardware Interrupts 
and the VIF and VIP 
Extensions 



In addition to these performance problems caused by virtual- 
ization of the IF flag in Virtual-8086 mode, software interrupts 
(those caused by INTn instructions that vector through inter- 
rupt gates) cannot be masked by the IF flag or virtual copies of 
the IF flag, these flags only affect hardware interrupts. Soft- 
ware interrupts in Virtual-8086 mode are normally directed to 
the Real mode interrupt vector table (IVT), but it may be 
desirable to redirect interrupts for certain vectors to the Pro- 
tected mode interrupt descriptor table (IDT). 

The processor’s Virtual-8086 mode extensions support both of 
these cases — hardware (external) interrupts and software 
interrupts — with mechanisms that preserve high performance 
without compromising protection. Virtualization of hardware 
interrupts is supported via the Virtual Interrupt Flag (VIF) 
and Virtual Interrupt Pending (VIP) flag in the EFLAGS regis- 
ter. Redirection of software interrupts is supported with the 
Interrupt Redirection Bitmap (IRB) in the TSS of each Virtual- 
8086 program. 

When VME extensions are enabled, the IF-modifying instruc- 
tions that are normally trapped by the operating system are 
allowed to execute, but they write and read the VIF bit rather 
than the IF bit in EFLAGS. This leaves maskable interrupts 
enabled for detection by the operating system. It also indicates 
to the operating system whether the Virtual-8086 program is 
able to or expecting to receive interrupts. 

When an external interrupt occurs, the processor switches 
from the Virtual-8086 program to the operating system, in the 
same manner as on a 386 or 486 processor. If the operating sys- 
tem determines that the interrupt is for the Virtual-8086 pro- 
gram, it checks the state of the VIF bit in the program’s 
EFLAGS image on the stack. If VIF has been set by the proces- 
sor (during an attempt by the program to set the IF bit), the 
operating system permits access to the appropriate Virtual- 
8086 handler via the interrupt vector table (IVT). If VIF has 
been cleared, the operating system holds the interrupt pend- 
ing. The operating system can do this by saving appropriate 
information (such as the interrupt vector), setting the pro- 
gram's VIP flag in the EFLAGS image on the stack, and return- 
ing to the interrupted program. When the program 
subsequently attempts to set IF, the set VIP flag causes the 
processor to inhibit the instruction and generate a general- 
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protection exception with error code zero, thereby notifying 
the operating system that the program is now prepared to 
accept the interrupt. 

Thus, when VME extensions are enabled, the VIF and VIP bits 
are set and cleared as follows: 

■ VIF — This bit is controlled by the processor and used by the 
operating system to determine whether an external 
maskable interrupt should be passed on to the program or 
held pending. VIF is set and cleared for instructions that 
can modify IF, and it is cleared during software interrupts 
through interrupt gates. The original IF value is preserved 
in the EFLAGS image on the stack. 

■ VIP — This bit is set and cleared by the operating system via 
the EFLAGS image on the stack. It is set when an interrupt 
occurs for a Virtual-8086 program who’s VIF bit is cleared. 
The bit is checked by the processor when the program sub- 
sequently attempts to set VIF. 

Figure 3-6 and Table 3-4 show the VIF and VIP bits in the 
EFLAGS register. The VME extensions support conventional 
emulation methods for passing interrupts to Virtual-8086 pro- 
grams, but they make it possible for the operating system to 
avoid time-consuming emulation of most instructions that 
write or read the IF. 

The VIF and IF flags only affect the way the operating system 
deals with hardware interrupts (the INTR signal). Software 
interrupts are handled like machine-generated exceptions and 
cannot be masked by real or virtual copies of IF (see page 3- 
21). The VIF and VIP flags only ease the software overhead 
associated with managing interrupts so that virtual copies of 
the IF flag do not have to be maintained by the operating sys- 
tem. Instead, each task’s TSS holds its own copy of these flags 
in its EFLAGS image. 
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— ► Reserved 



ID Flag 


ID 


21 -1 


Virtual Interrupt Pending 


VIP 


20 — 


Virtual Interrupt Flag 


VIF 


19 — 


Alignment Check 


AC 


18 — 


Virtual-8086 Mode 


VM 


17 — 


Resume Flag 


RF 


16 — 


Nested Task 


NT 


14 — 


I/O Privilege Level 


IOPL 


13-12 


Overflow Flag 


OF 


11 — 


Direction Flag 


DF 


10 — 


Interrupt Flag 


IF 


9 — 


Trap Flag 


TF 


8 — 


Sign Flag 


SF 


7 — 


Zero Flag 


ZF 


6 — 


Auxiliary Flag 


AF 


4 — 


Parity Flag 


PF 


2 — 


Carry Flag 


CF 


0 — 



Figure 3-6. EFLAGS Register 



Table 3-4. Virtual-Interrupt Additions to EFLAGS Register 



Bit 


Mnemonic 


Description 


Function 


20 


VIP 


Virtual Interrupt Pend- 
ing 


Set by the operating system (via the EFLAGS image on the stack) 
when an external maskable interrupt (INTR) occurs for a Virtual- 
8086 program who's VIF bit is cleared. The bit is checked by the 
processor when the program subsequently attempts to set VIF. 


19 


VIF 


Virtual Interrupt Flag 


When the VME bit in CR4 is set, the VIF bit is modified by the 
processor when a Virtual-8086 program running at less privilege 
than the IOPL attempts to modify the IF bit. The VIF bit is used by 
the operating system to determine whether a maskable interrupt 
should be passed on to the program or held pending. 
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Table 3-5A through Table 3-5E shows the effects, in various 
x86-processor modes, of instructions that read or write the IF 
and VIF flag. The column headings in this table include the fol- 
lowing values: 

■ PE — Protection Enable bit in CRO (bit 0) 

■ VM — Virtual-8086 Mode bit in EFFAGS (bit 17) 

■ VME — Virtual Mode Extensions bit in CR4 (bit 0) 

■ PVI — Protected-mode Virtual Interrupts bit in CR4 (bit 1) 

■ IOPL — I/O Privilege Fevel bits in EFFAGS (bits 13-12) 

■ Handler CPL — Code Privilege Fevel of the interrupt 
handler 

■ GP(0) — General-protection exception, with error code = 0 

■ IF — Interrupt Flag bit in EFFAGS (bit 9) 

■ VIF — Virtual Interrupt Flag bit in EFFAGS (bit 19) 



Table 3-5A. Instructions that Modify the IF or VIF Flags-Real Mode 



TYPE 


PE 


VM 


VME 


PVI 


IOPL 


GP(0) 


IF 


VIF 


CLI 


0 


0 


0 


0 


- 


No 


IF < — 0 


- 


STI 


0 


0 


0 


0 


- 


No 


IF <— 1 


- 


PUSHF 


0 


0 


0 


0 


- 


No 


Pushed 


- 


POPF 


0 


0 


0 


0 


- 


No 


Popped 


- 


IRET 


0 


0 


0 


0 


- 


No 


Popped 


- 


Notes: 

- Not applicable. 
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Table 3-5B. Instructions that Modify the IF or VIF Flags -Protected Mode 



TYPE 


PE 


VM 


VME 


PVI 


IOPL 


Handler 

CPL 


GP(0) 


IF 


VIF 


CLI 


1 


0 


- 


0 


> CPL 


- 


No 


IF < — 0 


- 


CLI 


1 


0 


- 


0 


< CPL 


- 


Yes 


- 


- 


STI 


1 


0 


- 


0 


> CPL 


- 


No 


IF <— 1 


- 


STI 


1 


0 


- 


0 


< CPL 


- 


Yes 


- 


- 


PUSHF 


1 


0 


- 


0 


> CPL 


- 


No 


Pushed 


- 


PUSHF 


1 


0 


- 


0 


< CPL 


- 


No 


Pushed 


- 


PUSHFD 


1 


0 


- 


0 


> CPL 


- 


No 


Pushed 


Pushed 


PUSHFD 


1 


0 


- 


0 


< CPL 


- 


No 


Pushed 


Pushed 


POPF 


1 


0 


- 


0 


> CPL 


- 


No 


Popped 


- 


POPF 


1 


0 


- 


0 


< CPL 


- 


No 


Not Popped 


- 


POPFD 


1 


0 


- 


0 


> CPL 


- 


No 


Popped 


Not Popped 


POPFD 


1 


0 


- 


0 


< CPL 


- 


No 


Not Popped 


Not Popped 


IRET 


1 


0 


- 


0 


- 


= 0 


No 


Popped 


- 


IRET 


1 


0 


- 


0 


> CPL 


>0 


No 1 


Popped 


- 


IRET 


1 


0 


- 


0 


< CPL 


>0 


No 1 


Not Popped 


- 


IRETD 


1 


0 


- 


0 


- 


= 0 


No 


Popped 


Popped 


IRETD 


1 


0 


- 


0 


> CPL 


>0 


No 1 


Popped 


Not Popped 


IRETD 


1 


0 


- 


0 


< CPL 


>0 


No 1 


Not Popped 


Not Popped 


Notes: 

1. CP( 0) if the CPL of the task executing IRETD is greater than the CPL of the task returned to. 
- Not applicable. 
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Table 3-5C. Instructions that Modify the IF or VIF Flags -Virtual-8086 Mode 



TYPE 


PE 


VM 


VME 


PVI 


IOPL 


GP(0) 


IF 


VIF 


cu 


1 


1 


0 


- 


3 


No 


IF < — 0 


No Change 


CLI 


1 


1 


0 


- 


<3 


Yes 


- 


- 


STI 


1 


1 


0 


- 


3 


No 


IF <— 1 


No Change 


STI 


1 


1 


0 


- 


<3 


Yes 


- 


- 


PUSHF 


1 


1 


0 


- 


3 


No 


Pushed 


- 


PUSHF 


1 


1 


0 


- 


<3 


Yes 


- 


- 


PUSHFD 


1 


1 


0 


- 


3 


No 


Pushed 


Pushed 


PUSHFD 


1 


1 


0 


- 


<3 


Yes 


- 


- 


POPF 


1 


1 


0 


- 


3 


No 


Popped 


- 


POPF 


1 


1 


0 


- 


<3 


Yes 


- 


- 


POPFD 


1 


1 


0 


- 


3 


No 


Popped 


Not Popped 


POPFD 


1 


1 


0 


- 


<3 


Yes 


- 


- 


IRETD 2 


1 


1 


0 


- 


- 


No 


Popped 


Popped 


Notes: 

1. All Virtual-8086 mode tasks run at CPL = 3. 

2. All protected virtual interrupt handlers run at CPL = 0. 
- Not applicable. 
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Table 3-5D. Instructions that Modify the IF or VIF Flags -Virtual-8086 Mode Interrupt 
Extensions (VME) 1 



TYPE 


PE 


VM 


VME 


PVI 


IOPL 


GP(0) 


IF 


VIF 


CLI 


1 


1 


1 


- 


3 


No 


IF < — 0 


No Change 


CLI 


1 


1 


1 


- 


<3 


No 


No Change 


VIF <— 0 


STI 


1 


1 


1 


- 


3 


No 


IF <— 1 


No Change 


STI 


1 


1 


1 


- 


<3 


No 3 


No Change 


VIF <— 1 


PUSHF 


1 


1 


1 


- 


3 


No 


Pushed 


Not Pushed 


PUSHF 


1 


1 


1 


- 


<3 


No 


Not Pushed 


Pushed into IF 


PUSHFD 


1 


1 


1 


- 


3 


No 


Pushed 


Pushed 


PUSHFD 


1 


1 


1 


- 


<3 


Yes 


- 


- 


POPF 


1 


1 


1 


- 


3 


No 


Popped 


Not Popped 


POPF 


1 


1 


1 


- 


<3 


No 


Not Popped 


Popped from IF 


POPFD 


1 


1 


1 


- 


3 


No 


Popped 


Not Popped 


POPFD 


1 


1 


1 


- 


<3 


Yes 


- 


- 


IRET from 
V86 Mode 


1 


1 


1 


- 


3 


No 


Popped 


Not Popped 


IRET from 
V86 Mode 


1 


1 


1 


- 


<3 


No 3 


Not Popped 


Popped from IF 


IRETD from 
V86 Mode 


1 


1 


1 


- 


3 


No 


Popped 


Not Popped 


IRETD from 
V86 Mode 


1 


1 


1 


- 


<3 


Yes 


- 


- 


IRETD from 
Protected Mode 2 


1 


1 


1 


- 


- 


No 3 


Popped 


Popped 


Notes: 

1. All Virtual-8086 mode tasks run at CPL = 3. 

2. All protected virtual interrupt handlers run at CPL = 0. 

3. CP( 0) if an attempt is made to set VIF when VIP=1. 
- Not applicable. 
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Table 3-5E. Instructions that Modify the IF or VIF Flags -Protected Mode Virtual 
Interrupt Extensions (PVI) 1 



TYPE 


PE 


VM 


VME 


PVI 


IOPL 


GP(0) 


IF 


VIF 


CLI 


1 


0 


- 


1 


3 


No 


IF < — 0 


No Change 


CLI 


1 


0 


- 


1 


<3 


No 


No Change 


VIF <— 0 


STI 


1 


0 


- 


1 


3 


No 


IF <— 1 


No Change 


STI 


1 


0 


- 


1 


<3 


No 3 


No Change 


VIF <— 1 


PUSHF 


1 


0 


- 


1 


3 


No 


Pushed 


Not Pushed 


PUSHF 


1 


0 


- 


1 


<3 


No 


Pushed 


Not Pushed 


PUSHFD 


1 


0 


- 


1 


3 


No 


Pushed 


Pushed 


PUSHFD 


1 


0 


- 


1 


<3 


No 


Pushed 


Pushed 


POPF 


1 


0 


- 


1 


3 


No 


Popped 


Not Popped 


POPF 


1 


0 


- 


1 


<3 


No 


Not Popped 


Not Popped 


POPFD 


1 


0 


- 


1 


3 


No 


Popped 


Not Popped 


POPFD 


1 


0 


- 


1 


<3 


No 


Not Popped 


Not Popped 


IRETD 2 


1 


0 


- 


1 


- 


No 3 


Popped 


Popped 


Notes: 

1. All Protected mode virtual interrupt tasks run at CPL = 3. 

2. All protected mode virtual interrupt handlers run at CPL = 0. 

3. GP( 0) if an attempt is made to set VIF when VIP = 1. 

- Not applicable. 
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Software Interrupts 
and the Interrupt 
Redirection Bitmap 
(IRB) Extension 



In Virtual-8086 mode, software interrupts (INTn exceptions 
that vector through interrupt gates) are trapped by the operat- 
ing system for emulation, because they would otherwise clear 
the real IF. When VME extensions are enabled, these INTn 
instructions are allowed to execute normally, vectoring 
directly to a Virtual-8086 service routine via the Virtual-8086 
interrupt vector table (IVT) at address 0 of the task address 
space. However, it may still be desirable for security or perfor- 
mance reasons to intercept INTn instructions on a vector-spe- 
cific basis to allow servicing by Protected-mode routines 
accessed through the interrupt descriptor table (IDT). This is 
accomplished by an Interrupt Redirection Bitmap (IRB) in the 
TSS, which is created by the operating system in a manner sim- 
ilar to the 10 Permission Bitmap (IOPB) in the TSS. 

Figure 3-7 shows the format of the TSS, with the Interrupt 
Redirection Bitmap near the top. The IRB contains 256 bits, 
one for each possible software-interrupt vector. The most-sig- 
nificant bit of the IRB is located immediately below the base of 
the IOPB. This bit controls interrupt vector 255. The least-sig- 
nificant bit of the IRB controls interrupt vector 0. 

The bits in the IRB work as follows: 

■ Set — If set to 1, the INTn instruction behaves as if the VME 
extensions are not enabled. The interrupt vectors to a Pro- 
tected-mode routine if IOPL = 3, or it causes a general-pro- 
tection exception with error code zero if IOPL<3. 

■ Cleared — If cleared to 0, the INTn instruction vectors 
directly to the corresponding Virtual-8086 service routine 
via the Virtual-8086 program’s IVT. 

Only software interrupts can be redirected via the IRB to a 
Real mode IVT — hardware interrupts cannot. Hardware inter- 
rupts are asynchronous events and do not belong to any cur- 
rent virtual task. The processor thus has no way of deciding 
which IVT (for which Virtual-8086 program) to direct a hard- 
ware interrupt to. Because of this, hardware interrupts always 
require operating system intervention. The VIF and VIP bits 
described on page 3-13 are provided to assist the operating sys- 
tem in this intervention. 
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Table 3-6 compares the behavior of hardware and software 
interrupts in various x86-processor operating modes. It also 
shows which interrupt table is accessed: the Protected-mode 
IDT or the Real- and Virtual-8086-mode IVT. The column head- 
ings in this table include: 

■ PE — Protection Enable bit in CRO (bit 0) 

■ VM — Virtual-8086 Mode bit in EFLAGS (bit 17) 

■ VME — Virtual Mode Extensions bit in CR4 (bit 0) 

■ PVI — Protected-Mode Virtual Interrupts bit in CR4 (bit 1) 

■ IOPL — I/O Privilege Level bits in EFLAGS (bits 13-12) 

■ IRB — Interrupt Redirection Bit for a task, from the Inter- 
rupt Redirection Bitmap (IRB) in the tasks TSS 

■ GP(0) — General-protection exception, with error code = 0 

■ IDT — Protected-Mode Interrupt Descriptor Table 

■ IVT — Real- and Virtual-8086 Mode Interrupt Vector Table 



Table 3-6. Interrupt Behavior and Interrupt-Table Access 



Mode 


Interrupt 

Type 


PE 


VM 


VME 


PVI 


IOPL 


IRB 


GP(0) 


IDT 


IVT 


Real mode 


Software 


0 


0 


0 


- 


0 


- 


- 


- 


/ 


Hardware 


0 


0 


0 


- 


0 


- 


- 


- 


/ 


Protected mode 


Software 


1 


0 


0 


- 


- 


- 


- 


/ 


- 


Hardware 


1 


0 


0 


- 


- 


- 


- 


/ 


- 


Virtual-8086 

mode 1 


Software 


1 


1 


0 


- 


= 3 


- 


No 


/ 


- 


Software 


1 


1 


0 


- 


<3 


- 


Yes 


/ 


- 


Hardware 


1 


1 


0 


- 


- 


- 


No 


/ 


- 


Virtual-8086 
Mode Exten- 
sions (VME) 1 


Software 


1 


1 


1 


0 


- 


0 


No 


- 


/ 


Software 


1 


1 


1 


0 


= 3 


1 


No 


/ 


- 


Software 


1 


1 


1 


0 


<3 


1 


Yes 


/ 


- 


Hardware 


1 


1 


1 


0 


- 


- 


No 


/ 


- 


Protected Vir- 
tual Extensions 
(PVI) 


Software 


1 


0 


1 


1 


- 


- 


No 


/ 


- 


Hardware 


1 


0 


1 


1 


- 


- 


No 


/ 


- 


Notes: 

1. All Virtual-8086 tasks run at CPL = 3. 
- Not applicable. 
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3.1.5 Protected Virtual Interrupt (PVI) Extensions 

The Protected Virtual Interrupts (PVI) bit in CR4 enables sup- 
port for interrupt virtualization in Protected mode. In this vir- 
tualization, the processor maintains program-specific VIF and 
VIP flags in a manner similar to those in Virtual-8086 Mode 
Extensions (VME). When a program is executed at CPL = 3, it 
can set and clear its copy of the VIF flag without causing gen- 
eral-protection exceptions. 

The only differences between the VME and PVI extensions are 
that, in PVI, selective INTn interception using the Interrupt 
Redirection Bitmap in the TSS does not apply, and only the STI 
and CLI instructions are affected by the extension. 

Table 3-5A through Table 3-5E and Table 3-6 show, among 
other things, the behavior of hardware and software inter- 
rupts, and instructions that affect interrupts, in Protected 
mode with the PVI extensions enabled. 



3-24 



Software Environment and Extensions 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



3.2 Model-Specific Registers (MSRs) 



The processor supports model-specific registers (MSRs) that 
can be accessed with the RDMSR and WRMSR instructions 
when CPL = 0. The following index values in the ECX register 
access specific MSRs: 

■ OOh: Machine-Check Address Register (MCAR) 

■ Olh: Machine-Check Type Register (MCTR) 

■ 1 Oh: Time Stamp Counter (TSC) 

■ 82h\ Array Access Register (AAR) 

■ 83h: Hardware Configuration Register (HWCR) 

The RDMSR and WRMSR instructions are described in Section 
3.3.5 on page 3-33. The following sections describe the format 
of the registers. 

3.2.1 Machine-Check Address Register (MCAR) 

The processor latches the address of the current bus cycle in 
its 64-bit Machine-Check Address Register (MCAR) when a 
bus-cycle error occurs. These errors are indicated either by (a) 
system logic asserting BUSCHK, or (b) the processor asserting 
PCHK while system logic asserts PEN. 

The MCAR can be read with the RDMSR instruction when the 
ECX register contains the value OOh. Figure 3-8 shows the for- 
mat of the MCAR register. The contents of the register can be 
read with the RDMSR instruction. 

If system software has set the MCE bit in CR4 before the bus- 
cycle error, the processor also generates a machine-check 
exception as described in Section 3.1.1 on page 3-4. 



63 0 



Physical Address of Last Bus Cycle that Failed 



Figure 3-8. Machine-Check Address Register (MCAR) 
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3.2.2 Machine-Check Type Register (MCTR) 

The processor latches the cycle definition and other informa- 
tion about the current bus cycle in its 64-bit Machine-Check 
Type Register (MCTR) at the same times that the Machine- 
Check Address Register (MCAR) latches the cycle address: 
when a bus-cycle error occurs. These errors are indicated 
either by (a) system logic asserting BUSCHK, or (b) the proces- 
sor asserting PCHK while system logic asserts PEN. 

The MCTR can be read with the RDMSR instruction when the 
ECX register contains the value Olh. Figure 3-9 and Table 3-7 
show the formats of the MCTR register. The contents of the 
register can be read with the RDMSR instruction. The proces- 
sor clears the CHK bit (bit 0) in MCTR when the register is 
read with the RDMSR instruction. 

If system software has set the MCE bit in CR4 before the bus- 
cycle error, the processor also generates a machine-check 
exception as described in Section 3.1.1 on page 3-4. 



63 
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M 
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w 
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il 



Reserved 



Locked Cycle 


LOCK 


4 


Memory or 1/0 Cycle 


M/10 


3 


Data or Code Cycle 


D/C 


2 


Write or Read Cycle 


W/R 


1 


Valid Machine-Check Data 


CHK 


0 



Figure 3-9. Machine-Check Type Register (MCTR) 
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Table 3-7. Machine-Check Type Register (MCTR) Fields 



Bit 


Mnemonic 


Description 


Function 


4 


LOCK 


Locked Cycle 


Set to 1 if the processor was asserting LOCK during the bus 
cycle. 


3 


M/ID 


Memory or I/O 


1 = memory cycle, 0 = I/O cycle. 


2 


D/C 


Data or Code 


1 = data cycle, 0 = code cycle. 


1 


W/R 


Write or Read 


1 = write cycle, 0 = read cycle. 


0 


CHK 


Valid Machine-Check 
Data 


The processor sets the CHK bit to 1 when both the MCTR and 
MCAR registers contain valid information. The processor clears 
the CHK bit to 0 when software reads the MCTR with the 
RDMSR instruction. 



3.2.3 Time Stamp Counter (TSC) 

With each processor clock cycle, the processor increments a 64- 
bit time stamp counter (TSC) model-specific register. The 
counter can be written or read using the WRMSR or RDMSR 
instructions when the ECX register contains the value lOh and 
CPL = 0. The counter can also be read using the RDTSC 
instruction (see Section 3.3.4 on page 3-32) but the required 
privilege level for this instruction is determined by the Time 
Stamp Disable (TSD) bit in CR4. With any of these instruc- 
tions, the EDX and EAX registers hold the upper and lower 
double-words (dwords) of the 64-bit value to be written to or 
read from the TSC, as follows: 

■ EDX — Upper 32 bits of TSC 

■ EAX — Lower 32 bits of TSC 

The TSC can be loaded with any arbitrary value. 

3.2.4 Array Access Register (AAR) 

The Array Access Register (AAR) contains pointers for testing 
the tag and data arrays for the instruction cache, data cache, 4- 
Kbyte TLB, and 4-Mbyte TLB. The AAR can be written or read 
with the WRMSR or RDMSR instruction when the ECX regis- 
ter contains the value 82h. 

Lor details on the AAR, see Section 7.4 on page 7-7. 
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3.2.5 Hardware Configuration Register (HWCR) 

The Hardware Configuration Register (HWCR) contains con- 
figuration bits that control miscellaneous debugging functions. 
The HWCR can be written or read with the WRMSR or 
RDMSR instruction when the ECX register contains the value 
83h. 

For details on the HWCR, see Section 7.1 on page 7-3. 

3.3 New Instructions 



In addition to supporting all of the 486 processor instructions, 
the AMD-K5 processor implements the following instructions: 

■ CPUID 

■ CMPXCHG8B 

■ MOV to and from CR4 

■ RDTSC 

■ RDMSR 

■ WRMSR 

■ RSM 

■ Illegal instruction (reserved opcode) 
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3.3.1 CPUID 

mnemonic opcode description 


AMD-K5 Processor Technical Reference Manual 


CPUID 


0FA2h Identify processor 




Privilege: 


CPL=0 




Registers Affected: 


EAX, EBX, ECX, EDX 




Flags Affected: 


none 




Exceptions Generated: 


Real, Virtual-8086 mode- none 
Protected mode- none 





The CPUID instruction identifies the type of processor and the features it supports. 
A 0 or 1 value written to the EAX register specifies what information will be 
returned by the instruction. 

The processor implements the ID flag (bit 21) in the EFLAGS register. By writing and 
reading this bit, software can verify that the processor will execute the CPUID 
instruction. 

For detailed instructions on processor and feature identification see the AMD Proces- 
sor Recognition application note, order# 20734. 

Table 3-8 outlines the AMD-K5 processor family codes and model codes with the 
CPU clock frequencies (MHz), bus frequencies (MHz), and P-rating strings (“Pxxx”). 



Table 3-8. CPU Clock Frequencies, Bus Frequencies, and P-Rating Strings 



Family Code 


Model Code 


CPU Frequency (MHz) 


CPU Bus Frequency (MHz) 


P-Rating String ("Pxxx")’ 


5 


0 


75 


50 


P75 


90 


60 


P90 


100 


66 


PI 00 


1 


90 


60 


P120 


100 


66 


P133 


120 


60 


PI 50 


133 


66 


PI 66 


Notes: 

1. The CPUID instruction does not return a P-Rating string. 

- This table does not constitute product announcements. Instead, the information in the table represents possible product offerings. 
AMD will announce actual products based on availability and market demand.. 
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3.3.2 CMPXCHG8B 



mnemonic opcode description 



CMPXCHG8B r/m64 0FC7 Compare and exchange 8-byte operand 



Privilege: 

Registers Affected: 
Flags Affected: 
Exceptions Generated: 



Any level 

EAX, EBX, ECX, EDX 
ZF 

Real, Virtual-8086, Protected mode-GP(O) for all standard cases. Invalid opcode if 
destination is a register. 

Virtual-8086 mode-Page fault 



The CMPXCHG8B instruction is an 8-byte version of the 4-byte CMPXCHG instruc- 
tion supported by the 486 processor. CMPXCHG8B compares a value from memory 
with a value in the EDX and EAX register, as follows: 

■ EDX — Upper 32 bits of compare value 

■ EAX — Lower 32 bits of compare value 

If the memory value matches the value in EDX and EAX, the ZF flag is set to 1 and 
the 8-byte value in ECX and EBX is written to the memory location, as follows: 

■ ECX — Upper 32 bits of exchange value 

■ EBX — Lower 32 bits of exchange value 
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3.3.3 MOV to and from CR4 



mnemonic 


opcode 


description 


MOV CR4/J2 


0F22 


Move to CR4 from register 


MOV r32, CR4 


0F20 


Move to register from CR4 


Privilege: 


CPL = 0 




Registers Affected: 


CR4, 32-bit general-purpose register 


Flags Affected: 


none 




Exceptions Generated: Real mode- none 




Virtual-8086 mode-GP(O) 




Protected mode-GP(O) if CPL not = 0 



These instructions read and write control register 4 (CR4). 
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3.3.4 RDTSC 



mnemonic opcode description 



RDTSC 



0F31 Read time stamp counter 



Privilege: 

Registers Affected: 
Flags Affected: 
Exceptions Generated: 



Selectable by TSD bit in CR4 

EAX, EDX 

none 

Real, Virtual-8086 mode- Invalid Opcode 

Protected mode-GP (0) if CPL not = 0 when CR4.TSD = 1 



The processor’s 64-bit time stamp counter (TSC) increments on each processor clock. 
In Real or Protected mode, the counter can be read with the RDMSR instruction and 
written with the WRMSR instruction when CPL = 0. However, in Protected mode the 
RDTSC instruction can be used to read the counter at privilege levels higher than 
CPL = 0. 

The required privilege level for using the RDTSC instruction is determined by the 
Time Stamp Disable (TSD) bit in CR4, as follows: 

■ CPL = 0 — Set the TSD bit in CR4 to 1 

■ Any CPL — Clear the TSD bit in CR4 to 0 

The RDTSC instruction reads the counter value into the EDX and EAX registers as 
follows: 

■ EDX — Upper 32 bits of TSC 

■ EAX — Lower 32 bits of TSC 

The following example shows how the RDTSC instruction can be used. After this 
code is executed, EAX and EDX contain the time required to execute the RDTSC 
instruction. 



mov ecx.lOh 
mov eax , OOOOOOOOh 



db 


OFh , 


3 0 h 


db 


0 F h , 


31 h 


db 


OFh, 


31 h 



Time Stamp 

Initialize 

WRMSR 

RDTSC 

RDTSC 



Counter Access 
the Counter to 



via MSRs 
zero 
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3.3.5 RDMSR and WRMSR 



mnemonic opcode description 



RDMSR 0F32 Read model-specific register (MSR) 

WRMSR 0F30 Write model-specific register (MSR) 



Privilege: 

Registers Affected: 
Flags Affected: 
Exceptions Generated: 



CPL = 0 
EAX, ECX, EDX 
none 

Real-GP(O) for unimplemented MSR address 
Virtual-8086 mode-GP(O) 

Protected mode-GP(O) if CPL not = 0 

Protected mode-GP(O) for unimplemented MSR address 



The RDMSR or WRMSR instructions can be used in Real or Protected mode to access 

several 64-bit, model-specific registers (MSRs). These registers are addressed by the 

value in ECX, as follows: 

■ OOh: Machine-Check Address Register (MCAR). This may contain the physical 
address of the last bus cycle for which the BUSCHK or PCHK signal was asserted. 
For details, see Section 3.1.1 on page 3-4. 

■ Olh: Machine-Check Type Register (MCTR). This contains the cycle definition of 
the last bus cycle for which the BUSCHK or PCHK signal was asserted. For 
details, see Section 3.1.1 on page 3-4. The processor clears the CHK bit (bit 0) in 
MCTR when the register is read with the RDMSR instruction. 

■ lOh: Time Stamp Counter (TSC). This contains a time value. The TSC can be ini- 
tialized to any value with the WRMSR instruction, and it can be read with either 
the RDMSR or RDTSC instruction. For details, see Section 3.2.3 on page 3-27. 

■ 82h: Array Access Register (AAR). This contains an array pointer and test data 
for testing the processor’s cache and TFB arrays. For details on the AAR, see Sec- 
tion 7.4 on page 7-7. 

■ 83h: Hardware Configuration Register (HWCR). This contains configuration bits 
that control miscellaneous debugging functions. For details, see Section 7.1 on 
page 7-3. 
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The above value in ECX identifies the register to be read or written. The EDX and 
EAX registers contain the MSR values to be read or written, as follows: 

■ EDX — Upper 32 bits of MSR. For the AAR, this contains the array pointer and (in 
contrast to all other MSRs) its contents are not altered by a RDMSR instruction. 

■ EAX — Lower 32 bits of MSR. For the AAR, this contains the data to be read/writ- 
ten. 

All MSRs are 64 bits wide. However, the upper 32 bits of the AAR are write-only and 
are not returned on a read. EDX remains unaltered, making it more convenient to 
maintain the array pointer. 

If an attempt is made to execute either the RDMSR or WRMSR instruction when 
CPL is greater than 0, or to access an undefined model-specific register, the proces- 
sor generates a general-protection exception with error code zero. 

Model-specific registers, as their name implies, may or may not be implemented by 
later models of the AMD-K5 processor. 
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3.3.6 RSM 






mnemonic 


opcode 


description 


RSM 


OFAA 


Resume execution (exit System Management Mode) 


Privilege: 

Registers Affected: 

Flags Affected: 
Exceptions Generated: 


CPL = 0 

CS, DS, ES, FS, GS, SS, EIP, EFLAGS, LDTR, 

CR3, EAX, EBX, ECX, EDX, ESP, EBP, EDI, ESI 
none 

Real, Virtual-8086 mode- Invalid opcode if not in SMM 



Protected mode-invalid opcode if not in SMM 
Protected mode-GP(O) if CPL not = 0 



The RSM instruction should be the last instruction in any System Management Mode 
(SMM) service routine. It restores the processor state that was saved when the SMI 
interrupt was asserted. This instruction is only valid when the processor is in SMM. It 
generates an invalid opcode exception at all other times. 

The processor enters the Shutdown state if any of the following illegal conditions are 
encountered during the execution of the RSM instruction: the SMM base value is not 
aligned on a 32-Kbyte boundary, or any reserved bit of CR4 set to 1, or the PG bit is 
set while the PE is cleared in CRO, or the NW bit it set while the CD bit is cleared in 
CRO. 
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3.3.7 Illegal Instruction (Reserved Opcode) 



mnemonic opcode description 



(none) 



OFFF Illegal instruction (reserved opcode) 



Privilege: 

Registers Affected: 
Flags Affected: 
Exceptions Generated: 



Any level 

none 

none 

Real, Virtual-8086 mode-invalid opcode 
Protected mode- Invalid opcode 
Protected mode-invalid opcode 



This opcode always generates an invalid opcode exception. The opcode will not be 
used in future AMD K86™ processors. 
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Performance 



This chapter provides information to assist fast execution and 
details on dispatch and execution timing for x86 instructions. 
Throughout the chapter, the terms clock and cycle refer to pro- 
cessor clock cycles, not bus clock (CLK) cycles. 

4.1 Code Optimization 



The code optimization suggestions in this section cover both 
general superscalar optimization (that is, techniques common 
to both the AMD-K5 and Pentium processors) and techniques 
specific to the AMD-K5 processor. In general, all optimization 
techniques used for the Pentium processor apply to any wide- 
issue x86 processor, but wider-issue designs like the AMD-K5 
processor have fewer restrictions. 

4.1.1 General Superscalar T echniques 

■ Short Forms — Use shorter forms of instructions to increase 
the effective number of instructions that can be examined 
for decoding at any one time. Use 8-bit displacements and 
jump offsets where possible. 

■ Simple Instructions — Use simple instructions with hard- 
wired decode because they often perform more efficiently. 
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Moreover, future implementations may increase the penal- 
ties associated with microcoded instructions. 

■ Dependencies — Spread out true dependencies to increase 
the opportunities for parallel execution. Antidependencies 
and output dependencies do not impact performance. 

■ Memory Operands — Instructions that operate on data in 
memory (load/op/store) can inhibit parallelism. Using sepa- 
rate move and ALU instructions allows independent opera- 
tions to be performed in parallel. On the other hand, if 
there are no opportunities for parallel execution, use the 
load/op/store forms to reduce the number of register spills 
(storing register values in memory to free registers for 
other uses) and increase code density. 

■ Register Operands — Maintain frequently used values in reg- 
isters or on the stack rather than in static storage. 

■ Branch Prediction — Use control-flow constructs that allow 
effective branch prediction. Although correctly predicted 
branches have no cost, mispredicted branches incur a three 
clock penalty. 

■ Stack References — Use ESP for references to the stack so 
that EBP remains available for general use. 

■ Stack Allocation — When placing outgoing parameters on the 
stack, allocate space by adjusting the stack pointer (prefer- 
ably at the same time local storage is allocated on proce- 
dure entry) and use moves rather than pushes. This method 
of allocation allows random access to the outgoing parame- 
ters so that they may be set up when they are calculated, 
instead of having to be held somewhere else until the proce- 
dure call. This method also uses fewer execution resources 
(specifically, fewer register-file write ports when updating 
ESP). 

■ Shifts — Although there is only one shifter, certain shifts can 
be done using other execution units: for example, shift left 
1 by adding a value to itself. Use LEA index scaling to shift 
left by 1, 2, or 3. 

■ Data Embedded in Code — When data is embedded in the 
code segment, align it in separate cache blocks from nearby 
code to avoid some overhead in maintaining coherency 
between the instruction and data caches. 

■ Undefined Flags — Do not rely on the behavior of undefined 
flag results. 
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4.1.2 



■ Loops — Unroll loops to get more parallelism and reduce 
loop overhead even with branch prediction. Inline small 
routines to avoid procedure-call overhead. In both cases, 
however, consider the cost of possible increased register 
usage, which might add load/store instructions for register 
spilling. 

■ Indexed Addressing — There is no penalty for base + index 
addressing in the AMD-K5 processor. However, future 
implementations may have such a penalty to achieve a 
higher overall clock rate. 

Techniques Specific to the AMD-K5 Processor 

■ Jumps and Loops — JCXZ requires 1 cycle (correctly pre- 
dicted) and therefore is faster than a TEST/JZ, in contrast 
to the Pentium processor in which JCXZ requires 5 or 6 
cycles. All forms of LOOP take 2 cycles (correctly pre- 
dicted), which is also faster than the Pentium processor's 7 
or 8 cycles. 

■ Multiplies — Independent IMULs can be pipelined at one 
per cycle with 4-cycle latency, in contrast to the Pentium 
processor's serialized 9-cycle time. (MUL has the same 
latency, although the implicit AX usage of MUL prevents 
independent, parallel MUL operations.) 

■ Dispatch Conflicts — Load-balancing (that is, selecting 
instructions for parallel decode) is still important, but to a 
lesser extent than on the Pentium processor. In particular, 
arrange instructions to avoid execution-unit dispatching 
conflicts. (See Section 4.2 on page 4-5.) 

■ Instruction Prefixes — There is no penalty for instruction pre- 
fixes, including combinations such as segment-size and 
operand-size prefixes. This is particularly important for 16- 
bit code. However, future implementations may have penal- 
ties for the use of these prefixes. 

■ Byte Operations — Lor byte operations, the high and low 
bytes of AX, BX, CX, and DX are effectively independent 
registers that can be operated on in parallel. Lor example, 
reading AL does not have a dependency on an outstanding 
write to AH. 

■ Move and Convert— MOVZX, MOVSX, CBW, CWDE, CWD, 
CDQ all take 1 cycle (2 cycles for memory-based input), in 
contrast to the Pentium processor's 2 or 3 cycles. 
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■ Bit Scan — BSF and BSR take 1 cycle (2 cycles for memory- 
based input), in contrast to the Pentium processor's data- 
dependent 6 to 34 cycles. 

■ Bit Test — BT, BTS, BTR, and BTC take 1 cycle for register- 
based operands, and 2 or 3 cycles for memory-based oper- 
ands with immediate bit-offset, in contrast to the Pentium 
processor's 4 to 9 cycles. Register-based bit-offset forms on 
the AMD-K5 processor take 5 cycles. If the semantics of the 
register-based bit-offset form are desired (where the bit off- 
set can cover a very large bit string in memory), it is better 
to emulate this with simpler instructions that can be inter- 
leaved with independent instructions for greater parallel- 
ism. 

■ Floating-Point Top-of-Stack Bottleneck — The AMD-K5 proces- 
sor has a pipelined floating-point unit. Greater parallelism 
can be achieved by using FXCH in parallel with floating- 
point operations to alleviate the top-of-stack bottleneck, as 
in the Pentium processor. The AMD-K5 processor also per- 
mits integer operations (ALU, branch, load/store) in paral- 
lel with floating-point operations. 

■ Locating Branch Targets — Performance can be sensitive to 
code alignment, especially in tight loops. Locating branch 
targets to the first 17 bytes of the 32-byte cache line maxi- 
mizes the opportunity for parallel execution at the target. 
NOPs can be added to adjust this alignment. The AMD-K5 
processor executes NOPs (opcode 90h) at the rate of two per 
cycle. Adding NOPs is even more effective if they execute 
in parallel with existing code. Other instructions of greater 
length, such as a register-immediate TEST instruction, can 
be used as NOPs to minimize the overhead of such padding. 

■ Branch Prediction — There are two branch prediction bits in 
a 32-byte instruction cache line. One bit applies to the first 
16 bytes of the line and the second bit applies to the second 
16 bytes of the line. For effective branch prediction, code 
should be generated with one branch per 16-byte line half. 
The prediction is associated with the half-line containing 
the last byte of the branch instruction. 

■ Address-Generation Interlocks (AGIs) — The AMD-K5 proces- 
sor does not suffer from the single-cycle penalty that the 
486 and Pentium processors have when a result from execu- 
tion or from a data-cache access is used to form a cache 
address, so it is not necessary to avoid these situations. 
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4.2 Dispatch and Execution Timing 



This section documents functional unit usage for each instruc- 
tion, along with relative cycle numbers for dispatch and execu- 
tion of the associated ROPs for the instruction. 

4.2.1 Notation 

Table 4-1 on page 4-8 contains the definitions for the integer 
instructions. Table 4-3 on page 4-19 contains the definitions for 
the floating-point instructions. The first column in these tables 
indicates the instruction mnemonic and operand types. The fol- 
lowing notations are used in the AMD-K5 microprocessor docu- 
mentation: 

■ reg — register 

■ mem — memory location 

■ imm — immediate value 

■ int_16 — 16-bit integer 

■ int_32 — 32-bit integer 

■ int_64 — 64-bit integer 

■ real_32 — 32-bit floating-point number 

■ real_64 — 64-bit floating-point number 

■ real_80 — 80-bit floating-point number 

If an operand refers to a specific register, the register name is 
used (e.g., AX, DX). When the register name is of the form Exx 
(e.g., EAX, ESI), the width of the register depends on the oper- 
and size attribute. 
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The second column contains an identifier with the following 
format: 



X XX XXXXXXXX XXX XXX 




MODrm[2:0] 



MODrm[5:3] 



Opcode 



Addressing Mode: 

Ox = register 

10 = memory without index 

1 x = memory with or without index 

1 1 = memory with index 



1 = two-byte opcode (OF xx) 



The third column in the tables indicates whether the instruc- 
tion is Fastpath (F) or Microcoded (M). Fastpath and MROM 
ROPs cannot both be present in a decode stage at the same 
time. If a microcoded instruction appears at the head of the 
byte queue without having been present in the queue on the 
previous cycle, there is a one-cycle penalty for MROM entry 
point generation. 

Each x86 instruction is converted into one or more ROPs. The 
fourth column shows the execution unit and timing for each of 
the ROPs. The ROP types and corresponding execution units 
are: 

■ Id — load/store 

■ st — load/store 

■ alu — either aluO or alul 

■ aluO — aluO only 

■ alul — alul only 

■ brn — branch 

■ fadd — floating-point add pipe 

■ fmul — floating-point multiply pipe 

■ fpmv — floating-point move and compare pipe 

■ fpfill — floating-point upper half 
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The x/y value following the ROP type indicates the relative dis- 
patch and execution cycle of the opcode, in the absence of any 
conflicts. The format is: 

x/y[/z] 

where: 

■ x = Dispatch Cycle — The relative cycle in which the ROP is 
dispatched from decode to the reservation station. 

■ y = Execution Cycle — The relative cycle in which the ROP is 
issued from the reservation station to the execution unit. 

■ z = Result Cycle — The relative cycle in which the result is 
returned on the result bus. It is indicated only when the 
latency is greater than one cycle. For stores, it reflects the 
relative time that a store operand is available to be for- 
warded from the store buffer to a dependent load opera- 
tion. 

Using the time that the first ROP of an instruction is dis- 
patched to an execution unit as clock 1, the x/y value indicates 
in which clock each ROP is dispatched and executed relative to 
clock 1. The execution order and timing does not necessarily 
match the dispatch order and timing. 

If any of the instructions read from or write to memory, it is 
assumed that the data exists in the cache. 
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4.2.2 Integer Instructions 

Table 4-1 shows the execution-unit usage for each integer 
instruction, along with relative cycle numbers for dispatch and 
execution of the associated ROPs for the instruction. 



Table 4-1. Integer Instructions 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


ADD reg, reg 


0_0x_000000xx_xxx_xxx 


F 


alu 1/1 


ADD reg, mem 


0_lx_0000001x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


ADD mem, reg 


0_lx_0000000x_xxx_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


ADD AL/AX/EAX, imm 


0_xx_0000010x_xxx_xxx 


F 


alu 1/1 


ADD reg, imm 


0_0x_100000xx_000_xxx 


F 


alu 1/1 


ADD mem, imm 


0_lx_100000xx_000_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


AND reg, reg 


0_0x_001000xx_xxx_xxx 


F 


alu 1/1 


AND reg, mem 


0_lx_0010001x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


AND mem, reg 


0_lx_0010000x_xxx_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


AND AL/AX/EAX, imm 


0_xx_0 01001 0x_xxx_xxx 


F 


alu 1/1 


AND reg, imm 


0_0x_100000xx_100_xxx 


F 


alu 1/1 


AND mem, imm 


0_lx_100000xx_100_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


BSF reg, reg 


l_0x_l 0111 1 00_xxx_xxx 


F 


alul 1/1 


BSF reg, mem 


l_lx_l 01 1 1 1 00_xxx_xxx 


F 


Id 1/1 

alul 1/2 


BSR reg, reg 


l_0x_10111101_xxx_xxx 


F 


alul 1/1 


BSR reg, mem 


l_lx_10111101_xxx_xxx 


F 


Id 1/1 

alul 1/2 


BSWAP reg 


l_xx_l 1001xxx_xxx_xxx 


F 


alul 1/1 


BT reg, reg 


l_0x_10100011_xxx_xxx 


F 


alul 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


BT mem, reg 


l_lx_10100011_xxx_xxx 


M 


alul 1/1 

alu 1/2 

alu 2/3 

Id 2/4 

alul 3/5 


BT reg, imm 


l_0x_10111010_100_xxx 


F 


alul 1/1 


BT mem, imm 


l_lx_10111010_100_xxx 


F 


Id 1/1 

alul 1/2 


BTC reg, reg 


l_0x_10111011_xxx_xxx 


F 


alul 1/1 


BTC mem, reg 


l_lx_10111011_xxx_xxx 


M 


alul 1/1 

alu 1/2 

alu 2/3 

Id 2/4 

alul 3/5 

st 3/5/6 


BTC reg, imm 


l_0x_10111010_lll_xxx 


F 


alul 1/1 


BTC mem, imm 


l_lx_10111010_lll_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


BTR reg, reg 


l_0x_10110011_xxx_xxx 


F 


alul 1/1 


BTR mem, reg 


l_lx_10110011_xxx_xxx 


M 


alul 1/1 

alu 1/2 

alu 2/3 

Id 2/4 

alul 3/5 

st 3/5/6 


BTR reg, imm 


l_0x_10111010_110_xxx 


F 


alul 1/1 


BTR mem, imm 


l_lx_10111010_110_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


BTS reg, reg 


l_0x_l 010101 l_xxx_xxx 


F 


alul 1/1 


BTS mem, reg 


l_lx_l 010101 l_xxx_xxx 


M 


alul 1/1 

alu 1/2 

alu 2/3 

Id 2/4 

alul 3/5 

st 3/5/6 


BTS reg, imm 


l_0x_10111010_101_xxx 


F 


alul 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


BTS mem, imm 


l_lx_10111010_101_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


CALL near relative 


0_xx_l 1101 0 00_xxx_xxx 


M 


alu 1/1 

st 1/1/2 

alu 1/1 

brn 1/1 


CALL near reg 


0_0x_l 1 1 1 1 1 1 1_0 1 0_xxx 


M 


alu 1/1 

st 1/1/2 

alu 1/1 

brn 1/1 


CALL near mem 


0_1 x_l 1111111 0 1 0_xxx 


M 


alu 1/1 

Id 1/1 

st 1/1/2 

alu 1/1 

brn 2/2 


CBW/DE 


0_xx_10011000_xxx_xxx 


F 


alul 1/1 


CMP reg, reg 


0_0x_001110xx_xxx_xxx 


F 


alu 1/1 


CMP reg, mem 


0_lx_0011 101x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


CMP mem, reg 


0_lx_0011100x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


CMP AL/AX/EAX, imm 


0_xx_0 01111 0x_xxx_xxx 


F 


alu 1/1 


CMP reg, imm 


0_0x_100000xx_lll_xxx 


F 


alu 1/1 


CMP mem, imm 


0_lx_100000xx_lll_xxx 


F 


Id 1/1 

alu 1/2 


CWD/DQ 


0_xx_10011001_xxx_xxx 


F 


alul 1/1 


DEC reg 


0_xx_01001xxx_xxx_xxx 


F 


alu 1/1 


DEC reg 


0_0x_lllllllx_001_xxx 


F 


alu 1/1 


DEC mem 


0_lx_lllllllx_001_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


IMUL AX, AL, reg 


0_0x_l 11101 10_101_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 


IMUL EDX:EAX, EAX, reg 


0_0x_l 11101 ll_101_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 


IMUL reg, reg 


l_0x_10101111_xxx_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


IMUL reg, reg, imm 


0_0x_0 110 1 Oxl_xxx_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 


IMUL AX, AL, mem 


0_lx_l 11101 10_101_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


IMUL EDX:EAX, EAX, mem 


0_lx_l 11101 ll_101_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


IMUL reg, mem 


l_lx_10101111_xxx_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


IMUL reg, reg, mem 


0_lx_011010xl_xxx_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


INC reg 


0_xx_0 1 00 Oxxx_xxx_xxx 


F 


alu 1/1 


INC reg 


0_0x_l 1111 1 lx_000_xxx 


F 


alu 1/1 


INC mem 


0_lx_lllllllx_000_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


Jcc short displacement 


0_xx_0 111 xxxx_xxx_xxx 


F 


brn 1/1 


Jcc long displacement 


l_xx_l 000xxxx_xxx_xxx 


F 


brn 1/1 


JCXZ short displacement 


0_xx_l 110001 l_xxx_xxx 


F 


brn 1/1 


JMP long displacement 


0_xx_l 1101001_xxx_xxx 


F 


brn 1/1 


JMP short displacement 


0_xx_l 110101 l_xxx_xxx 


F 


brn 1/1 


JMP reg 


0_0x_l 1 1 1 1 1 1 1_1 00_xxx 


F 


brn 1/1 


JMP mem 


0_lx_l 1 1 1 1 1 1 1_1 00_xxx 


F 


Id 1/1 

brn 1/2 


LEA 


0_lx_10001101_xxx_xxx 


F 


Id 1/1 


LOOP short displacement 


0_xx_l 110001 0_xxx_xxx 


F 


alu 1/1 

brn 1/2 


LOOPE short displacement 


0_xx_11100001_xxx_xxx 


M 


alu 1/1 

brn 1/2 


LOOPNE short displacement 


0_xx_11100000_xxx_xxx 


M 


alu 1/1 

brn 1/2 


LODS AL/AX/EAX, mem 


0_xx_l 0101 0 lx_xxx_xxx 


F 


Id 1/1 

alu 1/1 


MOV reg, reg 


0_0x_100010xx_xxx_xxx 


F 


alu 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


MOV reg, mem 


0_lx_1000101x_xxx_xxx 


F 


Id 1/1 


MOV mem, reg 


0_10_1000100x_xxx_xxx 


F 


st 1/1 


MOV mem, reg 

(base + index addressing) 


0_ll_1000100x_xxx_xxx 


F 


Id 1/1 

st 1/2/3 


MOV AL/AX/EAX, mem 


0_xx_1010000x_xxx_xxx 


F 


Id 1/1 


MOV mem, AL/AX/EAX 


0_xx_1010001x_xxx_xxx 


F 


st 1/1 


MOV reg, imm 


0_0x_l 10001 lx_000_xxx 


F 


alu 1/1 


MOV reg, imm 


0_xx_1011xxxx_xxx_xxx 


F 


alu 1/1 


MOV mem, imm 


0_10_1 10001 lx_000_xxx 


F 


alu 1/1 

st 1/1 


MOV mem, imm 
(base + index addressing) 


0_11_1 10001 lx_000_xxx 


F 


alu 1/1 

Id 1/1 

st 1/2/3 


MOVSX reg, reg 


l_0x_l 0111 1 lx_xxx_xxx 


F 


alul 1/1 


MOVSX reg, mem 


l_lx_l 01 1 1 1 lx_xxx_xxx 


F 


Id 1/1 

alul 1/2 


MOVZX reg, reg 


l_0x_1011011x_xxx_xxx 


F 


alu 1/1 


MOVZX reg, mem 


l_lx_1011011x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


MUL AX, AL, reg 


0_0x_l 11101 10_100_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 


MUL EDX:EAX, EAX, reg 


0_0x_l 111011 1_1 00_xxx 


F 


fpfill 1/1/4 

fmul 1/1/4 


MUL AX, AL, mem 


0_lx_l 11101 10_100_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


MUL EDX:EAX, EAX, mem 


0_lx_l 11101 ll_100_xxx 


F 


Id 1/1 

fpfill 1/2/4 

fmul 1/2/4 


NEG reg 


0_0x_l 11101 lx_011_xxx 


F 


alu 1/1 


NEG mem 


0_lx_l 11101 lx_011_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


NOP (XCHG EAX, EAX) 


0_xx_10010000_xxx_xxx 


F 


alu 1/1 


NOT reg 


0_0x_l 1110 1 lx_0 1 0_xxx 


F 


alu 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


NOT mem 


0_lx_l 11101 lx_010_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


OR reg, reg 


0_0x_000010xx_xxx_xxx 


F 


alu 1/1 


OR reg, mem 


0_lx_0000101x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


OR mem, reg 


0_lx_0000100x_xxx_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


OR AL/AX/EAX, imm 


0_xx_0000 1 1 0x_xxx_xxx 


F 


alu 1/1 


OR reg, imm 


0_0x_100000xx_001_xxx 


F 


alu 1/1 


OR mem, imm 


0_lx_100000xx_001_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


POP reg 


0_xx_0101 lxxx_xxx_xxx 


F 


Id 1/1 

alu 1/1 


POP reg 


0_0x_10001111_000_xxx 


F 


Id 1/1 

alu 1/1 


POP mem 


0_lx_10001111_000_xxx 


M 


Id 1/1 

Id 1/1 

st 2/2/3 

alu 2/2 


PUSH reg 


0_xx_0 101 Oxxx_xxx_xxx 


F 


st 1/1 

alu 1/1/2 


PUSH reg 


0_0x_l 1 1 1 1 1 1 1_1 1 0_xxx 


F 


st 1/1 

alu 1/1/2 


PUSH imm 


0_xx_0 110 1 0x0_xxx_xxx 


F 


alu 1/1 

st 1/1/2 

alu 1/1 


PUSH mem 


0_lx_l 1 1 1 1 1 1 1_1 1 0_xxx 


M 


Id 1/1 

st 1/1/2 

alu 1/1 


RET near 


0_xx_11000011_xxx_xxx 


F 


Id 1/1 

alu 1/1 

brn 1/2 


RET near imm 


0_xx_11000010_xxx_xxx 


M 


Id 1/1 

alu 1/1 

alu 1/2 

brn 1/2 



Dispatch and Execution Timing 



4-13 






AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


ROL reg, 1 


0_0x_1101000x_000_xxx 


F 


alul 1/1 


ROL mem, 1 


0_lx_1101000x_000_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


ROL reg, imm 


0_0x_1100000x_000_xxx 


F 


alul 1/1 


ROL mem, imm 


0_lx_1100000x_000_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


ROL reg, CL 


0_0x_1101001x_000_xxx 


F 


alul 1/1 


ROL mem, CL 


0_lx_l 101001x_000_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


ROR reg, 1 


0_0x_1101000x_001_xxx 


F 


alul 1/1 


ROR mem, 1 


0_lx_1101000x_001_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


ROR reg, imm 


0_0x_1100000x_001_xxx 


F 


alul 1/1 


ROR mem, imm 


0_lx_1100000x_001_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


ROR reg, CL 


0_0x_l 101001x_001_xxx 


F 


alul 1/1 


ROR mem, CL 


0_lx_l 101001x_001_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SAR reg, 1 


0_0x_1101000x_lll_xxx 


F 


alul 1/1 


SAR mem, 1 


0_lx_1101000x_lll_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SAR reg, mem 


0_0x_l 100000x_l ll_xxx 


F 


alul 1/1 


SAR mem, imm 


0_lx_l 100000x_l ll_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SAR reg, CL 


0_0x_1101001x_lll_xxx 


F 


alul 1/1 


SAR mem, CL 


0_lx_1101001x_lll_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SETcc reg 


l_0x_1001xxxx_xxx_xxx 


F 


brn 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


SETcc mem 


l_lx_1001xxxx_xxx_xxx 


F 


brn 1/1 

Id 1/1 

st 1/2/3 


SHL reg, 1 


0_0x_1101000x_lx0_xxx 


F 


alul 1/1 


SHL mem, 1 


0_lx_1101000x_lx0_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHL reg, mem 


0_0x_1100000x_lx0_xxx 


F 


alul 1/1 


SHL mem, imm 


0_lx_1100000x_lx0_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHL reg, CL 


0_0x_1101001x_lx0_xxx 


F 


alul 1/1 


SHL mem, CL 


0_lx_l 101001x_lx0_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHLD reg, reg, imm 


l_0x_10100100_xxx_xxx 


F 


alul 1/1 

alul 2/2 


SHLD mem, reg, imm 


l_lx_10100100_xxx_xxx 


M 


alul 1/1 

Id 1/1 

alul 2/2 

st 2/2/3 


SHLD reg, reg, CL 


l_0x_10100101_xxx_xxx 


F 


alul 1/1 

alul 2/2 


SHLD mem, reg, CL 


l_lx_10100101_xxx_xxx 


M 


alul 1/1 

Id 1/1 

alul 2/2 

st 2/2/3 


SHR reg, 1 


0_0x_1101000x_101_xxx 


F 


alul 1/1 


SHR mem, 1 


0_lx_1101000x_101_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHR reg, mem 


0_0x_1100000x_101_xxx 


F 


alul 1/1 


SHR mem, imm 


0_lx_1100000x_101_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHR reg, CL 


0_0x_1101001x_101_xxx 


F 


alul 1/1 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


SHR mem, CL 


0_lx_1101001x_101_xxx 


F 


Id 1/1 

alul 1/2 

st 1/1/3 


SHRD reg, reg, imm 


l_0x_l 0101 1 00_xxx_xxx 


F 


alul 1/1 

alul 2/2 


SHRD mem, reg, imm 


l_lx_10101100_xxx_xxx 


M 


alul 1/1 

Id 1/1 

alul 2/2 

st 2/2/3 


SHRD reg, reg, CL 


l_0x_10101101_xxx_xxx 


F 


alul 1/1 

alul 2/2 


SHRD mem, reg, CL 


l_lx_10101101_xxx_xxx 


M 


alul 1/1 

Id 1/1 

alul 2/2 

st 2/2/3 


STOS mem, AL/AX/EAX 


0_xx_l 0101 0 lx_xxx_xxx 


F 


st 1/1/3 

alu 1/1 


SUB reg, reg 


0_0x_001010xx_xxx_xxx 


F 


alu 1/1 


SUB reg, mem 


0_lx_0010101x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


SUB mem, reg 


0_lx_0010100x_xxx_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


SUB AL/AX/EAX, imm 


0_xx_0 01011 0x_xxx_xxx 


F 


alu 1/1 


SUB reg, imm 


0_0x_100000xx_101_xxx 


F 


alu 1/1 


SUB mem, imm 


0_lx_100000xx_101_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


TEST reg, reg 


0_0x_l 00001 0x_xxx_xxx 


F 


alu 1/1 


TEST mem, reg 


0_lx_l 00001 0x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


TEST reg, imm 


0_0x_l 1 1 1 01 lx_00x_xxx 


F 


alu 1/1 


TEST AL/AX/EAX, imm 


0_xx_l 010 1 00x_xxx_xxx 


F 


alu 1/1 


TEST mem, imm 


0_lx_l 11101 lx_00x_xxx 


F 


Id 1/1 

alu 1/2 


XCHG EAX, reg (except EAX) 


0_xx_l 00 1 0xxx_xxx_xxx 


F 


alu 1/1 
alu 1/1 
alu 2/2 
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Table 4-1. Integer Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcode 


Execution 
Unit Timing 


XCHG reg, reg 


0_0x_l 00001 lx_xxx_xxx 


F 


alu 1/1 
alu 1/1 
alu 2/2 


XCHG mem, reg 


0_lx_l 00001 lx_xxx_xxx 


F 


Id 1/1 

st 1/1/2 

alu 1/2 


XOR reg, reg 


0_0x_001100xx_xxx_xxx 


F 


alu 1/1 


XOR reg, mem 


0_lx_0011001x_xxx_xxx 


F 


Id 1/1 

alu 1/2 


XOR mem, reg 


0_lx_0011000x_xxx_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 


XOR AL/AX/EAX, imm 


0_xx_0 01101 0x_xxx_xxx 


F 


alu 1/1 


XOR reg, imm 


0_0x_100000xx_110_xxx 


F 


alu 1/1 


XOR mem, imm 


0_lx_100000xx_110_xxx 


F 


Id 1/1 

alu 1/2 

st 1/1/3 



4.2.3 Integer Dot Product Example 

This example illustrates an optimal code sequence for an inte- 
ger dot product operation that performs multiply/accumulates 
(MACs) at the rate of one every 3 cycles. In this example, the 
array size is a constant. The loop is unrolled to perform sepa- 
rate MAC operations in parallel for even and odd elements. 
The final sum is generated outside the loop (as well as the final 
iteration for odd-sized arrays). 



mac_l oop 
MOV 


EAX 


[ESI] [ECX*4] 


load A ( i ) 


MOV 


EBX 


[ESI] [ECX*4]+4 


load A(i+1) 


IMUL 


EAX 


[ EDI ] [ ECX*4] 


AIT ) * B ( i ) 


IMUL 


EBX 


[ EDI ] [ ECX*4]+4 


A( i+1 ) * B( i+1 ) 


ADD 


ECX 


2 


increment index 


ADD 


EDX 


EAX 


even sum 


ADD 


EBP 


EBX 


odd sum 


CMP 


ECX 


EVEN_ARRAY_SIZE 


loop control 


JL 


mac_ 


_1 oop 


jump 


;do final 


MAC here for odd-sized arrays 


ADD 


EDX, 


EBP 


final sum 
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Table 4-2 shows the timing of internal operations from dis- 
patch to retire of each ROP for nearly two iterations of this 
loop. All memory accesses are assumed to hit in the cache. 
E VEN_ARRA Y_SIZE is set to 20. 



Table 4-2. Integer Dot Product Internal Operations Timing 



Instruction 


Cycle 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


MOV EAX,[ESI][ECX*4] 


L 


> 


- 


- 


- 


! 


















MOV EBX,[ESI][ECX*4]+4 


L 


> 


- 


- 


- 


! 


















IMUL EAX,[EDI][ECX*4] 




L 


> 


- 


- 


! 




















- 


M 


M 


M 


M 


> 


j 














IMUL EBX,[EDI][ECX*4]+4 






L 


> 


- 


- 


- 


! 


















- 


M 


M 


M 


M 


> 


! 












ADD ECX,2 






A 


> 


- 


- 


- 


! 














ADD EDX,EAX 










- 


- 


- 


A 


> 


! 










ADD EBP,EBX 










- 


- 


- 


A 


> 


! 










CMP ECX,20 












- 


- 


- 


A 


> 


j 








JL LOOP 












- 


- 


- 


- 


B 


> 


! 






MOV EAX,[ESI][ECX*4] 














L 


> 


- 


- 


- 


! 






MOV EBX,[ESI][ECX*4]+4 














L 


> 


- 


- 


- 


! 






IMUL EAX,[EDI][ECX*4] 
















L 


> 


- 


- 


j 




















- 


M 


M 


M 


M 


> 


j 


IMUL EAX,[EDI][ECX*4]+4 


















L 


> 


- 


- 


- 


! 


















- 


M 


M 


M 


M 


> 



Notes: 

L- load execute 
M- multiply execute 
A- ALU execute 
B- branch execute 
>- result 

I- retire ( update real state) 

- - preceding or after execute: waiting in the reservation station 
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4.2.4 Floating-Point Instructions 

Floating-point ROPs are always dispatched in pairs to the FPU 
reservation station. The first ROP conveys the lower halves of 
the A and B operands, and it always has the fpfill ROP type. 
The second ROP conveys the upper halves of the operands, as 
well as the numeric opcode. Data from both ROPs is merged in 
the reservation station and must be converted into an internal 
floating-point format before it can be issued to the add pipe 
( fadd ), multiply pipe ( fmul ), or detect pipe ( fmv ). It takes one 
cycle to perform the conversion, and this delay is incurred 
whenever the source of the data is the register file or one of 
the other functional units (e.g., load/store, ALU). If data is 
being forwarded from the FPU itself, however, no format con- 
version is required and operands are fast-forwarded from the 
back end of a pipe to the front of any other pipe without the 
one-cycle delay. 

The add/subtract/reverse FPU latencies assume that cancella- 
tion does not occur in the adder/subtractor. If cancellation 
does occur, an extra cycle is required to normalize the result. 

Table 4-3 shows the execution-unit usage for each floating- 
point instruction, along with relative cycle numbers for dis- 
patch and execution of the associated ROPs for the instruction. 



Table 4-3. Floating-Point Instructions 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FABS 


0_0x_11011001_100_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FADD ST, ST(i) 


0_0x_l 1011000_000_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FADD ST(i), ST 


0_0x_l 1011000_000_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FADD real_32 


0_lx_l 1011000_000_xxx 


F 


Id 1/1 

fpfill 1/3/6 

fadd 1/3/6 


FADD real_64 


0_lx_11011100_000_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/7 

fadd 1/4/7 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FAD DP ST(i), ST 


0_0x_l 101111 0_0 00_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FCHS 


0_0x_11011001_100_xxx 


F 


fpfill 1/2/4 

fchs 1/2/4 


FCOM ST(i) 


0_0x_l 1011x00_010_xxx 


F 


fpfill 1/2/4 

fcmpst 1/2/4 


FCOM real_32 


0_lx_l 1011000_010_xxx 


F 


Id 1/1 

fpfill 1/3/5 

fmv 1/3/5 


FCOM real_64 


0_lx_11011100_010_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/6 

fadd 1/4/6 


FCOMP ST(i) 


0_0x_11011x00_011_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 

alu 1/1 


FCOMP real_32 


0_lx_11011000_011_xxx 


F 


Id 1/1 

fpfill 1/3/5 

fmv 1/3/5 


FCOMP real_64 


0_lx_11011100_011_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/6 

fadd 1/4/6 


FCOMPP 


0_0x_l 101111 0_0 1 l_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 

nop 1/1/2 


FDECSTP 


0_0x_l 1011001_1 10_xxx 


M 


alu 1/1/2 

alu 1/1/2 


FIADD int_l 6 


0_lx_11011110_000_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 


FIADD int_32 


0_lx_l 1011010_000_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FICOM int_l 6 


0_1 x_l 101111 0_0 1 0_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/9 

fmv 2/7/9 


FICOM int_32 


0_lx_l 101101 0_0 1 0_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/9 

fmv 2/7/9 


FICOMP int_16 


0_lx_11011110_011_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/9 

fmv 2/7/9 


FICOMP int_32 


0_lx_11011010_011_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/9 

fmv 2/7/9 


FILD int_16 


0_lx_l 10111 ll_000_xxx 


F 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 


FILD int_32 


0_lx_11011011_000_xxx 


F 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 


FILD int_64 


0_lx_l 10111 ll_101_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/8 

fadd 1/4/8 


FIMUL int_16 


0_lx_11011110_001_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/1 1 

fmul 2/7/1 1 


FIMUL int_32 


0_lx_l 1011010_001_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/1 1 

fmul 2/7/1 1 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FIST int_l 6 


0_lx_l 10111 ll_010_xxx 


M 


Id 1/1 

fpfill 1/2/5 

fadd 1/2/5 

st 1/5/6 


FIST int_32 


0_lx_11011011_010_xxx 


M 


Id 1/1 

fpfill 1/2/5 

fadd 1/2/5 

st 1/5/6 


FISTP int_l 6 


0_lx_l 101111 1_0 1 l_xxx 


M 


Id 1/1 

fpfill 1/2/5 

fadd 1/2/5 

st 1/5/6 


FISTP int_32 


0_lx_11011011_011_xxx 


M 


Id 1/1 

fpfill 1/2/5 

fadd 1/2/5 

st 1/5/6 


FISTP int_64 


0_lx_l 10111 ll_lll_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/2/5 

fadd 1/2/5 

st 2/3/6 

st 2/4/7 


FISUB int_16 


0_lx_11011110_100_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 


FISUB int_32 


0_lx_11011010_100_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 


FISUBR int_16 


0_lx_l 101111 0_1 0 l_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FISUBR int_32 


0_lx_11011010_101_xxx 


M 


Id 1/1 

fpfill 1/3/7 

fadd 1/3/7 

fpfill 2/7/10 

fadd 2/7/10 


FLD real_32 


0_lx_l 1011001_000_xxx 


F 


Id 1/1 

fpfill 1/3/5 

fmv 1/3/5 


FLD real_64 


0_lx_11011101_000_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/6 

fmv 1/4/6 


FLD real_80 


0_lx_11011011_101_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/6/8 

fmv 1/6/8 


FLD ST(i) 


0_0x_l 1011001_000_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 

nop 1/1 


FMUL ST, ST(i) 


0_0x_l 1011000_001_xxx 


F 


fpfill 1/2/8 

fmul 1/2/8 


FMUL ST(i), ST 


0_0x_11011100_001_xxx 


F 


fpfill 1/2/8 

fmul 1/2/8 


FMUL real_32 


0_lx_l 1011000_001_xxx 


F 


Id 1/1 

fpfill 1/3/7 

fmul 1/3/7 


FMUL real_64 


0_lx_11011100_001_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/10 

fmul 1/4/10 


FMULP ST, ST(i) 


0_0x_11011110_001_xxx 


F 


fpfill 1/2/8 

fmul 1/2/8 


FMULP ST(i), ST 


0_0x_11011110_001_xxx 


F 


fpfill 1/2/8 

fmul 1/2/8 


FNOP 


0_0x_l 1011001_010_xxx 


F 


alu 1/1/2 

alu 1/1/2 


FRNDINT 


0_0x_11011001_lll_xxx 


F 


fpfill 1/2/9 

fadd 1/2/9 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FSCALE 


0_0x_11011001_lll_xxx 


F 


fpfill 1/2/8 

fadd 1/2/8 


FST real_32 


0_lx_l 1011001_010_xxx 


M 


Id 1/1 

fpfill 1/2/4 

fmv 1/2/4 

st 1/2/5 


FST ST® 


0_0x_11011101_010_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FSTP real_32 


0_lx_11011001_011_xxx 


M 


Id 1/1 

fpfill 1/2/4 

fmv 1/2/4 

st 1/2/5 


FSTP real_64 


0_lx_11011101_011_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/2/4 

fmv 1/2/4 

st 2/3/5 

st 2/4/6 


FSTP real_80 


0_lx_11011011_lll_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/2/4 

fmv 1/2/4 

st 2/3/5 

st 2/4/6 


FSTP ST(i) 


0_0x_11011x01_011_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FSUB ST, ST(i) 


0_0x_11011000_100_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FSUB ST(i), ST 


0_0x_11011100_100_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FSUB real_32 


0_lx_11011000_100_xxx 


F 


Id 1/1 

fpfill 1/3/6 

fadd 1/3/6 


FSUB real_64 


0_lx_11011100_100_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/7 

fadd 1/4/7 


FSUBP ST(i), ST 


0_0x_11011110_100_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 
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Table 4-3. Floating-Point Instructions (continued) 



Instruction Mnemonic 


Opcode Format 


Fastpath or 
Microcoded 


Execution 
Unit Timing 


FSUBR ST, ST(i) 


0_0x_11011000_101_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FSUBR ST(i), ST 


0_0x_11011100_101_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FSUBR real_32 


0_lx_11011000_101_xxx 


F 


Id 1/1 

fpfill 1/3/6 

fadd 1/3/6 


FSUBR real_64 


0_lx_11011100_101_xxx 


M 


Id 1/1 

Id 1/2 

fpfill 1/4/7 

fadd 1/4/7 


FSUBRP ST(i), ST 


0_0x_11011110_101_xxx 


F 


fpfill 1/2/5 

fadd 1/2/5 


FTST 


0_0x_11011001_100_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FUCOM ST(i) 


0_0x_11011101_100_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FUCOMP ST(i) 


0_0x_11011101_101_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 

nop 1/1 


FUCOMPP 


0_0x_11011010_101_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 

nop 1/1 


FWAIT 


0_xx_l 00 1 1 0 1 l_xxx_xxx 


F 


alu 1/1 


FXAM 


0_0x_11011001_100_xxx 


F 


fpfill 1/2/4 

fmv 1/2/4 


FXCH ST(i) 


0_0x_l 1011001_001_xxx 


F 


brn 1/1 


FXTRACT 


0_0x_l 1011001_1 10_xxx 


M 


fpfill 1/2/4 

fmv 1/2/4 

fpfill 2/3/1 1 

fadd 2/3/1 1 

fpfill 3/4/6 

fmv 3/4/6 
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Bus Interface 



This chapter describes two closely related subjects, bus signals 
(Sections 5.1 and 5.2) and the bus-cycle protocols implemented 
with those signals (Sections 5.3 and 5.4). These sections 
describe only the architectural characteristics and functions of 
the signals and bus cycles. The processor data sheet defines 
the setup and hold times for signals. 

Throughout this chapter, unless otherwise stated, the term 
clock refers to bus-clock (CLK) cycles, not processor-clock 
cycles. The term cycle refers to bus cycles not clock cycles. The 
terms asserted and negated mean that a signal is sampled 
asserted or sampled negated by its target on the signal’s active 
(typically rising) clock edge. 
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5.1 Signal Overview 



The signals on the AMD-K5 processor are compatible with the 
comparable signals on the Pentium (735\90, 815U00) processor 
296-pin socket. Appendix A gives a complete list of hardware 
and software issues relating to this compatibility. The follow- 
ing figures and tables summarize the characteristics and 
behavior of the AMD-K5 processor’s signals: 

■ Figure 5-1 (Signal Groups) summarizes the processor’s sig- 
nals, showing the functional groups to which each signal 
belongs. 

■ Table 5-1 (Summary of Signal Characteristics) shows each 
signal’s I/O type, when it is sampled, driven, and floated, 
and its internal resistor (if any). 

■ Table 5-2 on page 5-8 (Conditions for Driving and Sampling 
Signals) shows the states and bus cycles during which the 
processor effectively drives or samples each signal. 

■ Table 5-3 on page 5-16 (Summary of Interrupts and Excep- 
tions) shows the priority and characteristics of interrupts 
and exceptions. 
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5.1.1 Signal Characteristics 



Table 5-1. Summary of Signal Characteristics 



Signal 


Type 


Sampled (Input) or 
Asserted (Output) 2 


Internal 

Resistor 


Floated 3 


A20M 1 


1 


Every clock. 






A3 1 -A3 


I/O 


Output: From ADS until last expected BRDY of the bus 
cycle. 

Input: Same clock as EADS. A4-A3 are disabled for input. 




AHOLD +1, 
BOFF +1 or 
HLDA 


ADS 


0 


First clock of bus cycle. 




BOFF +1 or 
HLDA 


ADSC 


0 


First clock of bus cycle. 




BOFF +1 or 
HLDA 


AHOLD 


1 


Every clock. 






AP 


I/O 


(same as A3 1 -A3) 




AHOLD +1, 
BOFF +1 or 
HLDA 


APCHK 


0 


Two clocks after EADS, for one clock. 






BE7-EE0 


0 


From ADS until the last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


BF(BFl-BFO) 


1 


Falling edge of RESET. 


pullup 




BOFF 


1 


Every clock. 






BRDY 


1 


Every clock, from one clock after ADS until the last 
expected BRDY of the bus cycle. 






BRDYC 


1 


(same as BRDY) 


pullup 




BREQ 


0 


First clock of every bus cycle (same as ADS), cache store, 
cache-tag recovery, and aliased cache load. Asserted con- 
tinuously while processor is held off bus and needs 
access to continue. 






BUSCHK 


1 


Every BRDY. Recognized at the next instruction boundary. 


pullup 




CACHE 


0 


From ADS until the last expected BRDY of the bus cycle. 
Driven for all reads; only driven for writes during write- 
backs. 




BOFF +1 or 
HLDA 


CLK 


1 


Always. 






Notes: 

/. Can be driven asynchronously or synchronously. 

2. The term clock means bus clock (CLK). “+n" means n CLKs later. 

3. "+n "means n CLKs after the named signal is sampled active. All outputs and bidirectional are floated during the float test (hLUSH 
at RESET). 
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Table 5-1. Summary of Signal Characteristics (continued) 



Signal 


Type 


Sampled (Input) or 
Asserted (Output) 2 


Internal 

Resistor 


Floated 3 


D/C 


0 


From ADS until the last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


D63-D0 


I/O 


Output (single transfer): From one clock after ADS until 
BRDY. 

Output (burst transfer): From one clock after ADS until 
the first BRDY, and thereafter from one clock after each 
BRDY until the next BRDY. 

Input: Every BRDY. 




BOFF +1 or 
HLDA 


DP7-DP0 


I/O 


(same as D63-D0) 




BOFF +1 or 
HLDA 


EADS 


1 


Every clock while AHOLD, BOFF or HLDA is asserted, 
beginning two clocks after the assertion of AHOLD, two 
clocks after the assertion of BOFF, or one clock after the 
assertion of HLDA; except while the processor drives 
A31-A3, while it asserts HUM, and one clock after EADS. 






EWBE 


1 


With BRDY of external write cycles and in every clock 
thereafter until EWBE is asserted. 






FERR 


0 


Every clock. 






FLUSH 1 


1 


Every clock. Falling-edge-triggered. Recognized at next 
instruction boundary. Acknowledged with Flush-Acknowl- 
edge special bus cycle. 






FRCMC 1 


1 


Every clock in which RESET is asserted. 






HIT 


0 


Every clock. Changes state two clocks after EADS and 
retains that state until two clocks after next EADS. 






HUM 


0 


Every clock. Changes state two clocks after EADS and 
retains that state until one clock after the last BRDY of 
writeback. 






HLDA 


0 


From two clocks after last BRDY of an in-progress bus 
cycle, or two clocks after HOLD, whichever comes last, 
until two clocks after HOLD is negated. 






HOLD 


1 


Every clock. Acknowledged with HLDA. 






TERR 


0 


Every clock, in the Functional-Redundancy Checking 
mode. 






Notes: 

1. Can be driven asynchronously or synchronously. 

2. The term clock means bus clock (CLK). “+n" means n CLKs later. 

3. "+n "means n CLKs after the named signal is sampled active. All outputs and bidirectionals are floated during the float test (FLUSH 
at RESET). 
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Table 5-1. Summary of Signal Characteristics (continued) 



Signal 


Type 


Sampled (Input) or 
Asserted (Output ) 2 


Internal 

Resistor 


Floated 3 


IGNNE 1 


1 


Every clock. 






IN IT 1 


1 


Every clock. Rising-edge-triggered. Recognized at next 
instruction boundary. 






INTR 1 


1 


Every clock. Level-sensitive. Recognized at next instruc- 
tion boundary. Acknowledged with an interrupt acknowl- 
edge operation. 






INV 


1 


Every EADS. 






KEN 


1 


First BRDY or NA of bus cycle, whichever comes first. Rec- 
ognized only during read cycles. 






LUCK 


0 


From ADS until last expected BRDY ot the bus cycle. 
Negated for one clock (dead cycle) between sequential 
locked operations. 




BOFF +1 or 
HLDA 


M/ID 


0 


From ADS until last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


NA 


1 


From one clock after ADS until the first expected BRDY ot 
a bus cycle. The only function of NA is to validate KEN or 
WB/WT in place of BRDY. 






NMI 1 


1 


Every clock. Rising-edge-triggered. Recognized at next 
instruction boundary. 






PCD 


0 


From ADS until last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


PCHK 


0 


Two clocks after every BRDY of read cycles. 






PEN 


1 


Every BRDY of read cycles, and second BRDY of interrupt 
acknowledge operation. 






PRDY 


0 


Every clock, in response to R/S. Asserted at instruction 
boundary after R/S is sampled Low. Negated in the clock 
after R/S is sampled High. 






PWT 


0 


From ADS until last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


R/S 1 


1 


Every clock. Level-sensitive. Recognized at next instruc- 
tion boundary. Acknowledged with PRDY. 


pullup 




RESET 1 


1 


Every clock. Recognized at next instruction boundary. 






Notes: 

1. Can be driven asynchronously or synchronously. 

2. The term clock means bus clock (CLK). "+n" means n CLKs later. 

3. "+n "means n CLKs after the named signal is sampled active. All outputs and bidirectionals are floated during the float test (FLUSH 
at RESET). 
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Table 5-1. Summary of Signal Characteristics (continued) 



Signal 


Type 


Sampled (Input) or 
Asserted (Output) 2 


Internal 

Resistor 


Floated 3 


SCYC 


0 


From ADS until last expeded BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


SMI 1 


1 


Every clock. Falling-edge-triggered. Recognized at next 
instruction boundary. Acknowledged with SMIACT. 


pullup 






0 


From one clock after the last expected BRDY of the bus 
cycle, while EWBE is asserted, until the return from SMM 
interrupt handler. 






SMIACT 




1 


Every clock. Level-sensitive. Recognized at next instruc- 
tion boundary. Acknowledged with Stop Grant special 
bus cycle. 


pullup 




SIPCLK 1 


TCK 


1 


Always. 


pullup 




TDI 


1 


Every rising TCK edge during the shiftJR and shift_DR 
states. 


pullup 




TDO 


0 


Every falling TCK edge during the shiftJR and shift_DR 
states. 




While not in 
shiftJR or 
shift_DR state. 


TMS 


1 


Every rising TCK edge. 


pullup 




TR5T 


1 


Always sampled asynchronously. 


pullup 




W/R 


0 


From ADS until last expected BRDY of the bus cycle. 




BOFF +1 or 
HLDA 


m/m 


1 


First BRDY or IMA of bus cycle, whichever comes first. 






Notes: 

1. Can be driven asynchronously or synchronously. 

2. The term clock means bus clock (CLK). “+n" means n CLKs later. 

3. "+n "means n CLKs after the named signal is sampled active. All outputs and bidirectionals are floated during the float test (hLUSH 
at RESET). 



5.1.2 Conditions for Driving and Sampling Signals 

Table 5-2 shows the processor states, signal states, and bus 
cycles during which the processor can drive or sample each sig- 
nal. The table indicates when signals can be driven or sampled 
so that their state has some practical (meaningful) effect on 
the state of the processor or on the bus cycle being driven or 
sampled. In Table 5-2, shading indicates signals that are mean- 
ingfully driven or sampled. Signals that are not shaded are not 
driven or sampled or are not meaningful. For details on how 
each signal behaves, see Section 5.2 starting on page 5-17. 



Signal Overview 



5-7 







AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



Table 5-2. Conditions for Driving and Sampling Signals 



Signal 


Conditions under which signals are meaningfully driven or sampled 


Bus Cycles or Cache Accesses 38 


Arbitration 


States and Modes 8 


Reset, 

Debug 


Memory Reads 14 


Memory Writes 14 


C7> 
fO 
CO 
-4— ' 

X 

QJ 

JC 

CJ 

ru 


Inquire Cycles 3 


I/O Cycles 


Locked Cycles 


Special Cycles 


Interrupt Acknow. 


AHOLD Active 


BOFF Active 


HLDA Active 


ro 

ro 

C 

§ 

o 

"O 

-4— ' 

zj 

jc 

on 


1 

ru 

X 


Stop Grant 


Stop Clock 


SMIACT Active 


RESET Active 


IN IT Active 


PRDY Active 


Bus Arbitration 


AHOLD 


1 












23 






- 






















BOFF 


1 




















- 




















BREQ 


0 






38 


































HLDA 


0 






39 














35 


- 


















HOLD 


1 




















35 




















Address and Ad 


dress Parity 


A20M 


1 


10 


10 


10 


10 




10 






10 


10 


10 
















10 


A3 1 -A3 2 


I/O 






44 


19 






19 




7 


4 


4 


3 


3 


3 












AP 


I/O 






38 












7 


4 


4 


3 


3 


3 












ADS 


0 






38 


37 










3 






3 


3 


3 












AD5C 


0 






38 


37 










3 






3 


3 


3 












APCHK 


0 












7 






3 


3 


3 


3 


3 


3 




3 






3 


BE7-BE0 








38 


37 










16 






3 


3 


3 












Cycle Definition and Control 


D/C 


0 






38 


37 










16 






3 


3 


3 












EWBE 


1 








37 






26 


26 








3 


3 


3 












LOCK 


0 






38 


1 




- 






16 






















M/ID 


0 






38 


37 










16 






3 


3 


3 












NA 18 


1 


18 


18 


18 












16 














18 








SCYC 


0 


13 


13 






13 








13 














13 








W/R 


0 






38 


37 










16 






3 


3 


3 
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Table 5-2. Conditions for Driving and Sampling Signals (continued) 



Signal 


Conditions under which signals are meaningfully driven or sampled 


Bus Cycles or Cache Accesses 38 


Arbitration 


States and Modes 8 


Reset, 

Debug 


Memory Reads 14 


Memory Writes 14 


os 
ro 
CO 
-4— ■» 

X 

CD 

-IZ 

CJ 

CD 


Inquire Cycles 3 


I/O Cycles 


Locked Cycles 


Special Cycles 


Interrupt Acknow. 


AHOLD Active 


BUFF Active 


HLDA Active 


ro 

to 

C 

S 

o 

"O 

■4— ' 

ZJ 

JZ 

on 


1 

CD 

X 


Stop Grant 


Stop Clock 




SMIACT Active 


RESET Active 


IN IT Active 


PRDY Active 


Cad 


lie Control 




CACHE 


0 






38 


37 


25 


25 


25 


25 


16 






3 


3 


3 










21 


KEN 42 


1 


















16 




















21 


PCD 


0 






38 












16 






3 


3 


3 










21 


PWT 


0 






38 












16 






3 


3 


3 










15 


m/m 


1 






38 












16 




















15 


Data and Data Parity 


BRDY 


1 






38 


37 










16 






3 


3 


3 












BRDYC 


1 






38 


37 










16 






3 


3 


3 












D63-D0 


I/O 






38 


37 










16 






3 


3 


3 












DP7-DP0 


I/O 






38 


37 










16 






3 


3 


3 












PCHK 42 


0 


















16 






















PEN 42 


1 


















16 






















Inquire Cycles 


EADS 2 


1 


43 


43 


43 




43 


1 


43 


43 
























HIT 


0 












1 




























HUM 


0 












1 




























INV 


1 


43 


43 


43 




43 


1 


43 


43 
























Floating-Point Errors 


FERR 


0 








































IGNNE 


1 
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Table 5-2. Conditions for Driving and Sampling Signals (continued) 



Signal 


Conditions under which signals are meaningfully driven or sampled 


Bus Cycles or Cache Accesses 38 


Arbitration 


States and Modes 8 


Reset, 

Debug 


Memory Reads 14 


Memory Writes 14 


os 

ro 

un 

-4— ■» 

X 

CD 

-1= 

CJ 

CD 


Inquire Cycles 3 


I/O Cycles 


Locked Cycles 


Special Cycles 


Interrupt Acknow. 


AHOLD Active 


BUFF Active 


HLDA Active 


ro 

ho 

C 

S 

o 

"O 

■4— ' 

ZJ 

JZ 

on 


1 

ru 

X 


Stop Grant 


Stop Clock 


SMIACT Active 


RESET Active 


IN IT Active 


PRDY Active 


External Interrupts, Interrupt Acknowledgments, and Reset 


BUSCHK 29 


1 






38 


29 










16 






3 


12 


12 












FLUSH 2 ' 


1 








41 










41 


41 


41 






12 












IN IT 27 


1 








30 










30 


30 


30 






12 




9 




- 




INTR 5 ' 28 


1 








40 










40 


40 


40 


















NMI 27 


1 




























12 




9 








PRDY 


0 






































- 


R/S 28 


1 






































31 


RESET 


1 








30 










30 


30 


30 












- 




17 


SMI 27 


1 




























12 










22 


SMIACT 


0 
































- 






32 


STPCLK 28 


1 








34 










34 


34 


34 




24 














Test and Debug 


FRCMC 


1 








































TERR 


0 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 




20 


20 


20 




PRDY 


0 


See "External Interrupts, Interrupt Acknowledgments, and Reset" 


R/S 


1 


See "External Interrupts, Interrupt Acknowledgments, and Reset" 


TCK 


1 








































TDI 


1 








































TDO 


0 








































TMS 


1 








































TR5T 


1 








































Bus and 1 


Processor Clock 


BF 


1 






























ll 










BF1-BF0 


1 






























ll 










CLK 


1 






























ll 
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Notes to Table 5-2: 

- Shading indicates signals that are meaningfully driven or sampled. Signals that are not shaded are not driven or sampled or are not 
meaningful. 

1. Inquire cycles can be driven while LUCK is asserted if AHOLD is used to obtain the bus for the inquire cycle. Inquire cycles never hit 
locations involved in a locked operation because the processor invalidates such locations, if found in trie cache, before doing the 
locked operation. If the inquire cycle hits a modified location that is different than the one invol ved in the locked operation, the write- 
back may be done in the middle of the locked operation, between the two locked cycles, with LUCK asserted during the writeback. 

2. A3 1 -A5 are l/U signals (input for inquire cycles), but A4-A3 are output only. 

3. Sampled or driven during inquire cycles or resulting writebacks. 

4. Sampled only during inquire cycles, but not driven for resulting writebacks. 

5. If enabled by the IF flag in EFLAGS. 

6. Uutput only. 

7. If AHULD is held asserted throughout an inquire cycle and writeback, system logic must use its latched copy of the inquire cycle 
address for the writeback. By contrast, if system logic always negates AHULD before the writeback, the processor will drive the write- 
back address when it asserts /YDS for the writeback. 

8. Signal recognition and assertion applies to the actual state, not to the special cycle driven by the processor prior to entering the state. 

9. During SMM, A/M and IN IT are recognized only in response to an I RET instruction. After the return from SMM (RSM instruction), a 
latched A/M or I NIT will be sen/iced. 

1 0. A2 0M is recognized only in Real mode, and masking is applied to linear addresses. Because the caches are linearly tagged, assertion 
of A20M during Real mode affects all program-generated cache addresses, including cache-line fills (caused by read misses), cache 
writethroughs (caused by write misses or write hits to lines in the shared state) ana cache accesses that occur while the processor 
does not control the bus. However, A20M does not mask inquire cycle addre sses o r any writebacks caused by inquire cycles; these 
addresses are looked up only in the physical tags, which are not masked byA20M. 

1 1. CLK can be driven with a different frequency, and/or BF can be changed when CLK is restarted on exit from the Stop-Clock state. 

12. Latched or (in the case of BUSCH K) otherwise sampled and held, pending exit from this state. 

13. SCYC may be asserted during any misaligned memory or I/O cycle, but it is only meaningful during locked cycles. 

14. Includes Protected, Vi rtua 1-8086 and Real modes, unless otherwise indicated. 

15. During the Hardware Debug Tool (HDT) mode, this signal is only meaningful for cache write misses (PWT=0 and WB/WT=1 tran- 
sition a shared line to an exclusive line). The signal is not meaningful during cache read misses in the HDT mode, because the caches 
are never filled during the HDT mode. 

1 6. Sampled or driven only during the completion of a cycle the processor initiated before the assertion of AHOLD, or for writebacks due 
to inquire cycles. 

17. Different than the Pentium processor. The system hardware or software must exit the HDT before asserting RESET. 

18. NA acts as an assertion ofBRDY, but only when sampled with KEN or WB/WT. It is valid only for memory read s and writes, including 
writethroughs during cache hits to shared or exclusive lines. NA has no effect on any signals other than KEN and WB/WT, and 
addresses are not pipelined when M is asserted. 

19. If an inquire cycle occurs during a Branch-Trace Message special cycle, the branch address information driven by the processor on 
A3 1 -A3 can be overwritten by the inquiring bus master. In such cases, external logic should latch A3 1 -A3 when /YDS is asserted (i. e., 
before asserting AHULD, BUFF or HULD). 

20. Used only to report errors in Functional Redundancy Checking mode and driven only by the Checker. 

21. This signal is not meaningful during cache read misses in the HDT mode, because the caches are never filled in the HDT mode. 

22. The debugger can force the processor into SMM, but the processor will not recognize SMI until PRDY is negated. If SMI is asserted 
while PRDY is asserted, it is latched and acted upon after PRDY is negated. 

23. During AHULD, the system must prevent other bus masters from locking the same address that the AMD-K5 processor is locking. 

24. Different than the Pentium processor, which ignores SIPCLK in this state. 

25. Always negated (non-cacheable). 

26. EWBE is not checked prior to running special bus cycles or interrupt acknowledge operations. All special bus cycles ( which have 
W/R=l ) and interrupt acknowledge operations ( which have W/R=0) serialize the pipeline and do not require EWBE for this purpose. 

27. An edge-triggered interrupt. It is latched when sampled and recognized on an instruction boundary. 

28. A level-sensitive interrupt. It must be held asserted until recognized, which occurs on an instruction boundary. 

29. U nlike othe r level-sensitive interrupts, BUSCHK is sampled with every BRDY and it does not need to be held asserted after sampling. 
If BUSCH K is assert ed during a locked operation or inquire cycle, an enabled machine-check exception will not be acted upon until 
after the last BRDY of the locked operation or after a writeback caused by an inquire cycle. 
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30. The first code fetch after register initialization during INIT or RESET does not occur if AHOLD, BUFF, or HLDA is asserted. 

3 1. PRDY is asserted either when R/S goes Low or when the Test Access Port (TAP) instruction, USEHDT, is executed. In the latter case, 
R/S is watched for a Low-to-High transition, which takes the processor out of the Hardware Debug Tool (HDT) mode. 

32. The processor can go into the Hardware Debug Tool (HDT) mode from within SMM either when R/S goes Low or when the TAP 
instruction, USEHDT, is executed (the instruction causes the processor to assert PRDY). In this case, SMIACT can be toggled with HDT 
commands. SMIACT selects main or SMM memory. 

33. Only NMf INIT, RESET, and SMI gets the processor out of the Shutdown state. 

34. The processor cannot drive the Stop-Grant special bus cycle. 

35. HOLD is sampled, but the only practical effect is to assert HLDA. 

36. Writebacks or writethroughs cannot occur when HLDA is asserted. 

37. During writebacks. 

38. During writebacks or writethroughs. 

39. Including writebacks and writethroughs (except for HLDA). 

40. The processor cannot drive the interrupt acknowledge cycle, and therefore cannot obtain the interrupt vector. 

4 1. If ELUSH is asserted while AHOLD, BUFF, or HLDA is asserted, the outcome of the flush depends on whether the flush causes write- 
backs of modified lines. If no writebacks are needed, the processor invalidates all lines but does not perform the TLUSH-acknowledge 
cycle until the processor gets control of the bus again. If a writeback is needed, the processor stops at that writeback without having 
invalidated any lines, waits until control of the bus is returned to the processor, then completes the FLUSH operation. 

42. Driven or sampled only during reads. 

43. Sampled after AHOLD or HLDA is asserted, and while the processor completes an in-progress bus cycle. 

44. Without /YDS during cache accesses, with /YDS during cache writethroughs and writebacks. 
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5.1.3 External Interrupts 

Interrupts and exceptions are often differentiated in x86 docu- 
mentation as follows: an interrupt is the assertion of a hard- 
ware input signal and an exception is a software event, such as 
an invalid opcode or execution of an INTn instruction. In some 
documents, however, the terms interrupt and exception apply to 
both hardware and software events, which are then differenti- 
ated as external or hardware interrupts or exceptions, and inter- 
nal or software interrupts or exceptions, respectively. In still 
other x86 documents, the term software interrupt means an 
INTn instruction that vectors to an interrupt gate. Moreover, 
some of the old rules commonly applied to interrupts do not 
apply to the external interrupts defined for the Pentium pro- 
cessor: for example, not all external interrupts alter the pro- 
gram flow, and not all are acknowledged by the processor. 

Because these variations in definition are potentially confus- 
ing, this document assumes only the following definitions: 

■ Interrupt — The assertion (or in the case of R/5, the driving 
Low) of one of eight hardware input signals (BUSCHK, R/5, 
FLUSH, SMI, INIT, NMI, INTR, or STPCLK). 

■ Exception — Any software-initiated event that accesses an 
entry in the Real mode interrupt vector table (IVT) or in 
the Protected mode interrupt descriptor table (IDT). 

■ External Interrupt — Same as interrupt. 

m Software Interrupt — In Real mode, any INTn instruction. In 
Protected mode, any INTn instruction that vectors to an 
IDT entry that is an interrupt gate, or that is a task gate 
which references a TSS with the interrupt flag (IF) cleared 
in its EFLAGS image. (INTn instructions that vector to a 
trap gate are not considered software interrupts because 
the processor does not clear IF in such cases.) 

All interrupts are recognized on the next instruction retire- 
ment boundary. Most exceptions are recognized at the point in 
the instruction where they occur, and are not usually deferred 
to the end of the instruction. All interrupts and exceptions 
invalidate (flush) the pipeline when recognized (as defined in 
Section 2.2.5 on page 2-12). All exceptions are handled pre- 
cisely so that the instruction causing an exception can be 
restarted after the exception is serviced. 
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The processor writes (pushes) its current state onto the stack 
prior to entering the service routine for exceptions and for 
BUSCHK, SMI, NMI, and INTR interrupts. Because of these 
writes, the state of EWBE affects the processor’s response to 
such interrupts and exceptions. For example, if the processor 
has initiated a write cycle prior to the next instruction retire- 
ment boundary on which such an interrupt would otherwise be 
recognized, the bus cycle completes but the processor does not 
respond to the interrupt until it samples EWBE asserted so 
that it can write to the stack. Also, if the processor has written 
to the stack once and EWBE is not asserted thereafter, the pro- 
cessor does not write again and its response to an interrupt is 
halted. A negated EWBE also pauses the processor’s response 
to FLUSH if the flush causes writebacks. However, during 
interrupts that do not write to memory (R/5, FLUSH if there 
are no writebacks, INIT, and STRUCK), the state of EWBE has 
no affect on the processor’s recognition of or response to such 
interrupts. 

The processor performs an interrupt by executing a microcode 
routine. In this sense, an interrupt acts like the execution of a 
complex instruction and the microcode routine has a comple- 
tion boundary that acts like an instruction retirement bound- 
ary. In effect, the microcode routine for an interrupt begins 
executing when the interrupt is recognized on an instruction 
boundary and it finishes executing when an associated inter- 
rupt service routine begins or the hardware aspect of the inter- 
rupt function otherwise completes. For example, the FLUSH 
interrupt completes when all modified cache lines have been 
written back to memory and all cache lines are invalidated, 
whereas the R/5 interrupt completes when the processor 
negates PRDY, and the STRUCK interrupt completes when the 
processor drives the Stop Grant special bus cycle. 

The four edge-triggered interrupts (FLUSH, SMI, INIT, and 
NMI) are latched on one of the edges of ULK when they are 
asserted and are recognized later, even if they are negated 
before being recognized. The four level-sensitive interrupts 
(BUSUHK, R/5, INTR, and STRUCK) must be held asserted 
until recognized, except that the BUSUHK interrupt is sampled 
and latched with every BRDY. 

The processor disables the recognition of interrupts or excep- 
tions in the following cases: 
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m INTR Interrupts — The processor disables INTR interrupts 
during all software interrupts (that is, INTn instructions that 
vector through interrupt gates or through task gates that 
reference a TSS with IF cleared in its EFLAGS image). It 
does this by automatically clearing the IF bit in EFLAGS. If 
system logic can leave the INTR signal asserted after the 
INTR service routine is entered, the interrupt vector 
returned by system logic during the Interrupt acknowledge 
operation must be for an interrupt gate or for a task gate 
that references a TSS with IF cleared. (Software may set 
the IF flag again upon entering the service routine.) 

■ NMI Interrupts — The processor disables NMI interrupts 
until the IRET of the NMI service routine. 

■ Debug Breakpoints — After a debug breakpoint exception, 
the debug service routine can disable debug exceptions for 
one instruction by setting the resume flag (RF) in EFLAGS 
to 1 to prevent restarted instructions from generating 
another debug fault. 

Table 5-3 shows the characteristics of interrupts and excep- 
tions and the priority with which the processor recognizes 
them. The term priority means two things here: 

■ Simultaneous Interrupts — The order in which a single inter- 
rupt or exception is selected for recognition if all occur 
simultaneously, and 

■ Latched Interrupts — The order in which latched interrupts 
(any of the four edge-triggered interrupts, FLUSH, SMI, 
INIT, or NMI) are recognized when the processor becomes 
interruptible again after it recognizes a prior interrupt or 
exception. By contrast, the term priority does not mean the 
order in which level-sensitive interrupts (BUSCHK, R/5, 
INTR, and STPCLK) are nested if one such interrupt occurs 
while the processor is responding to another interrupt. 

Interrupts are themselves interruptible only if they have a 
software component, such as a service routine. All other inter- 
rupts complete their action before the processor recognizes 
another interrupt. Lower-priority interruptible interrupts can 
be interrupted by higher-priority interrupts or exceptions at 
their point of interruptibility, as shown in the right-most column 
of Table 5-3, which is always on an instruction boundary. 
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Table 5-3. Summary of Interrupts and Exceptions 



Priority 


Description 


Type 


Sampling 5 


Vector 1 


Acknowledgment 


Point of Interruptibility 6 


l 


INTn instruc- 
tions and all 
other software 
exceptions 


exceptions 


internal 


0-255 


none 


Entry to service routine. 


2 


BUSCHK 


interrupt 


level-sensitive 


18 2 


none 


Entry to service routine. 2 


3 


R/S 


interrupt 


level-sensitive 


none 


PRDY 


Negation of PRDY. 


4 


FLUSH 


interrupt 


edge-triggered 4 


none 


FLUSH-Acknowl- 
edge special 
bus cycle 


BRDY of FLUSH Acknowl- 
edge bus cycle. 


5 


SMI 


interrupt 


edge-triggered 4 


SMM 5 


SMIACT 


Entry to SMM service 
routine. 7 


6 


INIT 


interrupt 


edge-triggered 4 


BIOS 


none 


Completion of 
initialization. 


7 


NMI 


interrupt 


edge-triggered 4 


2 


none 


NMI interrupts: IRET from 
service routine. All others: 
Entry to service routine. 


8 


INTR 


interrupt 


level-sensitive 


0-255 


Interrupt acknowl- 
edge special 
bus cycle 


Entry to service routine. 


9 




interrupt 


level-sensitive 


none 


Stop-Grant 
special bus cycle 




STPCLK 


Negation of STPCLK. 


Notes: 

1. For interrupts with vectors, the processor saves its state prior to accessing service routine and changing program flow. Interrupts 
without vectors do not change program flow; instead, they simply pause program flow for the duration of the interrupt function 
and then return to where they left off. 

2. If the machine check enable (MCE) bit in CR4 is set to 1. 

3. The entry point for the 5 TVfl interrupt handler is at offset 8000h from the SMM Base Address. 

4. Only the edge-triggered interrupts are latched when asserted. All interrupts are recognized at the next instruction retirement 
boundary. 

5. If a bus cycle is in progress, TWEE must be asserted before the interrupt is recognized. 

6. For external interrupts (most exceptions, by contrast, are recognized when they occur). External interrupts are recognized at 
instruction boundaries. MO V or POP instructions that load SS delay interruptibility until after the next instruction, thus allowing both 
SS and the corresponding SP to load. 

7. After assertion of EMI, subseguent assertions of EMI are masked so as to prevent recursive entry into SMM. Other exceptions or 
interrupts (except IN IT and NMI), however, will intervene in the SMM service routine. 



The processor recognizes BUFF, HOLD, and AHOLD while any 
interrupt signal is asserted, and these signals will intervene 
with their normal timing in the handling of any interrupt or 
exception. The interrupt or exception continues from where it 
left off after the intervening signal is negated. For example, if 
BUFF is asserted while a FLUSH operation is writing modified 
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cache lines back to memory, an in-progress writeback will be 
aborted, but it will be restarted after BUFF is negated, and the 
FLUSH operation will then continue; any writebacks that com- 
pleted before BUFF was asserted are not affected. 

5.1.4 Bus Signal Compatibility with Pentium Processor 

The differences in bus signal functions between the AMD-K5 
and Pentium processors are described in Section A.l on page 
A-2. 

5.2 Signal Descriptions 



The following pages describe each signal in detail. The bus 
cycle protocols that use these signals are described in Section 
5.3 on page 5-136. Chapter 6 describes the context in which the 
SMM and clock-control signals are used, and Chapter 7 does 
the same for the test signals. 
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5.2.1 

Summary 

Sampled 



Details 



A20M (Address Bit 20 Mask) 

Input 

Assertion of A20M causes the processor to clear bit 20 of the 
A31-A3 address bus to 0 prior to accessing the cache or mem- 
ory in Real mode. The clearing of address bit 20 bit maps 
addresses above 1 Mbyte to addresses below 1 Mbyte. 

The processor samples A20M in every clock during Real mode. 
System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

A20M is sampled only in Real mode during memory cycles 
(including cache writethroughs and writebacks) and locked 
cycles; or while AHOLD, BUFF, HLDA, RESET, INIT, or PRDY 
is asserted. A20M is not sampled when the processor is operat- 
ing in Protected mode, Virtual-8086 mode or SMM; during I/O 
cycles, inquire cycles, special bus cycles, or interrupt acknowl- 
edge operations; or while the processor is in the Shutdown, 
Halt, Stop Grant, or Stop Clock states. 

The action of clearing A20 so that addresses above 1MB wrap- 
around to addresses below 1 Mbyte simulates the behavior of 
the 8086 processor, allowing the processor to run software 
designed for DOS. A20M should only be asserted when the pro- 
cessor runs in Real mode. 

A20M should not be asserted during the first code fetch follow- 
ing the RESET or INIT cycles because the masking of bit 20 
leads to a fetch from an incorrect address. The BIOS and the 
operating system alone are responsible for controlling the 
state of A20M. After RESET or INIT, they do this by writing to 
an external I/O port. (I/O ports 60 and 64h, or port 92h, or regis- 
ter-shadowed versions of those ports are commonly used to 
control the state of A20M.) The instruction pipeline is serial- 
ized by virtue of writing to the I/O port, thus allowing time for 
the A20M signal to assert before the next memory or cache 
access. Advanced operating systems that do not run under 
DOS, such as Windows NT™ and OS/2 operating systems, do 
not use Real mode and never assert A20M. 

Programs running in Virtual-8086 mode run as tasks under Pro- 
tected mode. The effect of A20M for these Virtual-8086-mode 
tasks is normally emulated by the operating system using the 
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paging mechanism. The operating system writes page table 
entries so as to map all pages required for the Virtual-8086 
mode task to addresses below 1 Mbyte. 

Unlike the Pentium processor, the AMD-K5 processor ignores 
A20M in Protected mode, Virtual-8086 mode, and System Man- 
agement Mode (SMM). The Pentium processor masks the A20 
bit if A20M is asserted in Protected mode or Virtual-8086 
mode, even though this behavior is undefined and may change 
in future processors. The AMD-K5 processor simply ignores 
A20M except when the processor runs in Real mode. 

The AMD-K5 processor applies A20M masking to its linear 
cache tags, through which all programs access the caches. 

Thus, assertion of A20M affects all program-generated cache 
addresses, including cache-line fills (caused by read misses), 
cache writethroughs (caused by write misses or write hits to 
lines in the shared state), and cache accesses that occur while 
the processor does not control the bus. However, A20M does 
not mask writebacks or invalidations caused by internal 
snoops, inquire cycles, the FLUSH signal, or the WBINVD 
instruction — such addresses are looked up only in the physical 
tags, which are not masked by A20M. (See Table 2-3 on page 2- 
20 for details.) By contrast, the Pentium processor applies 
masking only to physical addresses. This difference of masking 
linear vs. physical addresses is not visible to software because 
linear and physical addresses are identical in Real mode. 

However, the AMD-K5 processor’s A20M linear address mask- 
ing can affect debug software differently than such masking on 
the Pentium processor. With A20M asserted, the AMD-K5 pro- 
cessor does breakpoint matching (debug-register comparisons) 
on masked addresses, whereas the Pentium processor does 
them on unmasked addresses. 
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5.2.2 A31- 

Summary 

Driven ; Sampled \ and 
Floated 



Details 



A3 (Address Bus) 

A31-A5 Bidirectional, A4-A3 Output 

A31-A3 carries the physical address for the current bus cycle. 
The processor drives addresses on A31-A3 during memory and 
I/O cycles, and cycle definition information during special bus 
cycles. It samples addresses on A31-A5 during inquire cycles. 

As Outputs: The processor drives A31-A3 from the clock in 
which ADS is asserted until the last expected BRDY of the bus 
cycle. The processor also drives A31-A3 without ADS during 
cache accesses. A31-A3 are driven during memory cycles 
(including cache writethroughs and writebacks), I/O cycles, 
inquire cycle writebacks, locked cycles, special bus cycles, and 
interrupt acknowledge operations in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM, and 
while PRDY is asserted. During special bus cycles and inter- 
rupt acknowledge operations, the address signals simply sup- 
port bus cycle definition; they do not provide an address. 

The processor floats A31-A3 as outputs, one clock after system 
logic asserts AHOLD or BUFF, and in the same clock that the 
processor asserts HI, DA. 

As Inputs: While AHOLD, BUFF, or HLDA is asserted, the pro- 
cessor samples A31-A5 in the same clock as EADS. A31-A5 are 
sampled in this way during inquire cycles in the normal operat- 
ing modes (Real, Protected, and Virtual-8086) and in SMM, 
including during the Shutdown, Halt, and Stop Grant states, 
and while PRDY is asserted. The A4-A3 signals are not inter- 
preted as part of the inquire cycle address but must neverthe- 
less be driven at valid 0 or 1 logic levels. The processor may 
again drive A31-A3 in the next clock after system logic negates 
AHOLD, BUFF, or HOLD. 

A31-A3 are never driven or sampled in the Stop Clock state, or 
while RESET or INIT is asserted. 

During processor-initiated bus cycles, the processor drives 
A31-A3 with ADS to define an eight-byte (quadword) starting 
address in physical memory or I/O space. System logic inter- 
prets these addresses in conjunction with the BE7-BE0 and 
cycle definition (D/U, M/IU, and W/R) outputs, and with the 
A20M input. The processor drives BE7-BE0 to define the valid- 
ity of each of the eight bytes accessed by the quadword 
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addresses on A31-A3. In this manner, BE7-BEU replace the 
function of address bits A2-A0, which do not exist. 

When the processor drives burst reads it drives the starting 
address on A31-A3 (which is the address of the quadword that 
contains the instruction or data required) and it drives BE7- 
BEO to specify the required bytes in that quadword. (This 
addressing method is unlike the 486 processor, which drives 
separate addresses for each transfer of a burst.) System logic 
must determine the remaining three quadword addresses as 
shown in Table 5-4. 

When the processor drives burst writes (writebacks), it drives 
the starting address on A31-A3 in the same manner as for 
burst reads, but it enables all eight bytes (BE7-BE0 = OOh) 
because it always starts writebacks at 32-byte aligned 
addresses (address of the first quadword is xxxx_xx00h). Thus, 
A4-A3 are always 00b for writebacks. 



Table 5-4. Address-Generation Sequence During Bursts 



Address Driven By 
Processor on A3 1 -A3 


Address of Subsequent Quadwords 1 
Generated By System Logic 


Quadword 1 


Quadword 2 


Quadword 3 


Quadword 4 


...OOh 


...08h 


...lOh 


... 1 8h 


...08h 


...OOh 


...18h 


...lOh 


...lOh 


...18h 


...OOh 


...08h 


...18h 


...lOh 


...08h 


...OOh 


Notes: 

1. quadword = 8 bytes 



System logic can derive memory and I/O port select signals, as 
well as memory row and address signals, from A31-A3 and the 
cycle definition signals. Although the processor does not inter- 
pret the A4-A3 signals as part of an inquire cycle address, sys- 
tem logic must drive them at valid logic levels (0 or 1) during 
inquire cycles, and the processor drives both bits to 0 during 
writebacks. 

While system logic has obtained control of the address bus via 
assertion of AHOLD, BUFF or HOLD, the A31-A5 signals 
become inputs and define a 32-byte, cache-line, inquire cycle 
address in conjunction with the following signals: 
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■ The EADS input defines the beginning of the inquire cycle 
and validates the input address on A31-A5. 

■ The AP input carries the even parity bit for the A31-A5 
address. 

■ The APCHK output indicates a parity error for the inquire 
cycle address on A31-A5. 

During such system-initiated inquire cycles, A31-A5 defines 
the starting physical address of a 32-byte cache line that is 
being snooped in the processor’s on-chip instruction and data 
caches. The processor interprets the addresses using its physi- 
cal address tags, in conjunction with the A20M input, in paral- 
lel with the processor’s own cache accesses that use its linear 
cache tags. 

If an inquire cycle hits a modified line in the processor’s data 
cache, the processor performs a writeback. During this write- 
back, A31-A5 defines a 32-byte starting address in physical 
memory. This address is identified by the processor’s assertion 
of ADS, just as with all other processor-initiated bus cycles, 
and the address must be interpreted by system logic in con- 
junction with the A20M input. 

The processor does not control the complete bus during a 
writeback caused by an inquire cycle; in these cases, AHOLD, 
BUFF or HOLD may still be asserted. However, in addition to 
writebacks caused by inquire cycle hits, writebacks can also 
occur while the processor controls the bus (by processor-initi- 
ated cache-line replacements, internal snoops for self-modify- 
ing code, or execution of the WBINVD instruction) or by 
system-initiated assertion of the FLUSH signal. 

If AHOLD is held asserted throughout an inquire cycle and 
writeback, system logic must latch the inquire cycle address 
when it asserts EADS. This is required so that, if the inquire 
cycle hits a modified line (HITM asserted), the processor need 
not drive the writeback address when it asserts ADS for the 
writeback, which can occur as early as two clocks after the pro- 
cessor asserts HITM. Instead, system logic must use its latched 
copy of the inquire cycle address for the writeback. By con- 
trast, if system logic always negates AHOLD before the write- 
back, the processor will drive the writeback address when it 
asserts ADS for the writeback, and system logic need not 
retain a copy of the inquire cycle address. 
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If an inquire cycle occurs while the processor is driving a 
Branch-Trace Message special bus cycle, the branch address 
information driven by the processor on A31-A3 can be over- 
written by the inquiring bus master. In such cases, system logic 
should latch A31-A3 when ADS is asserted (that is, before 
asserting AHOLD, BUFF or HOLD). 

At the falling edge of RESET, the states of BKDYC and BUS- 
CHK control the drive strength on A21-A3 (not including A31- 
A22). The drive strength is weak for all states of BRDYC and 
BUSCHK except BRDYC and BUSCHK both Low (0), in which 
case the drive strength is strong. The A31-A22 signals use the 
weak drive strength at all times. See the data sheet for details. 

Unlike the Pentium processor, pipelined address-data transac- 
tions are not supported by the AMD-K5 processor. Thus, the 
NA input has no effect on the processor’s address bus. NA only 
affects the sampling time for the KEN and WB/WT inputs. 
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5.2.3 ADS (Address Strobe) 

Output 

Summary The processor asserts ADS to specify the beginning of a mem- 

ory or I/O bus cycle, or a cache writeback to memory. The sig- 
nal validates the processor’s address and cycle definition 
signals and it can be used by system logic to enable accesses to 
memory and I/O. 

Driven and Floated During processor-initiated bus cycles, the processor asserts 

ADS for one clock at the beginning of each bus cycle. During 
writeback cycles, whether initiated by the processor or by sys- 
tem logic, the processor asserts ADS for one clock as early as 
two clocks after the processor asserts HITM. The processor can 
assert ADS as early as two clocks after the assertion of BRDY 
(thus allowing one idle or dead clock between any two bus 
cycles), and one clock after the negation of AHOLD, BUFF, or 
HLDA. 

ADS is driven during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, locked cycles, spe- 
cial bus cycles, and interrupt acknowledge operations in the 
normal operating modes (Real, Protected, and Virtual-8086) 
and in SMM, or while PRDY is asserted. While AHOLD is 
asserted, and during the Shutdown, Halt, and Stop Grant 
states, ADS is driven only for writebacks that result from 
inquire cycle hits. ADS is not driven during the Stop Clock 
state, or while BOFF, HLDA, RESET, or INIT is asserted. 

The processor floats ADS one clock after system logic asserts 
BOFF and in the same clock that the processor asserts HLDA. 

Details The processor initiates bus cycles for the purpose of reading 

and writing memory or I/O, and for writebacks of modified 
cache lines. While the processor controls the bus, or while it is 
writing back a modified cache line (whether in control of the 
bus or not), ADS defines the beginning of the cycle. In the 
clock that it asserts ADS, the processor also begins driving the 
several signals that define and qualify the bus cycle, including 
A31-A3 (or A31-A5 for writebacks), AP, the cycle definition 
signals (D/CTM/IU and W/R), BF7-BEU, BREQ, A20M, CACHE, 
LOCK, PCD, PWT and SCYC. 

If ADS initiates a cache line fill and all four ways of the cache 
that could accommodate the incoming line are filled with valid 
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entries, the processor uses a pseudo-random algorithm to 
select a line for replacement. If the selected line is cached in 
the modified state, it must be written back to memory. In this 
case, the order of events is: 

1. Complete the burst read, placing the incoming cache line in 
the processor’s line fill buffer. 

2. Write the modified line back to memory. 

3. Fill the vacated cache line with the contents of the line 
buffer. 

Processor-initiated writebacks can occur during cache line 
replacement, internal snoops for self -modifying code, and exe- 
cution of the WBINVD instruction. System-initiated writebacks 
can occur during inquire cycle hits to modified cache lines 
(while AHOLD, BOFF or HLDA is asserted) or by assertion of 
the FLUSH input. The processor drives writebacks by assert- 
ing ADS and either reusing the inquire cycle address (if 
AHOLD is held asserted throughout the writeback) or driving 
the address itself (if AHOLD is negated for the writeback, or if 
BOFF or HOLD was used to obtain the bus). 

During an inquire cycle that hits a modified cache line, the 
processor asserts ADS as soon as two clocks after asserting 
HU M, regardless of whether AHOLD is asserted or negated. 
By contrast, if BOFF or HLDA is asserted instead of AHOLD 
during an inquire hit, the processor postpones the writeback 
until after BOFF or HLDA is negated. 

During special bus cycles and interrupt acknowledge opera- 
tions, the processor drives ADS to validate A31-A3, BE7-BE0 
and the cycle definition signals. This use of ADS and A31-A3 
simply serves to identify the type of special bus cycle, rather 
than to address a location in memory or I/O space. 

The processor asserts BREQ in the same clock that it asserts 
ADS, although BREQ is also asserted at other times (see the 
description of BREQ on page 5-45). The processor negates ADS 
for one clock between any contiguous bus operations, such as 
between a single-transfer I/O write and a burst read from mem- 
ory, or between two burst reads. The same is true for contigu- 
ous sequences of locked operations (sequences of locked bus 
cycle pairs). System logic can use the negation of ADS between 
contiguous bus operations to make the bus available to other 
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bus masters, thus intervening temporarily in the processor’s 
sequential operations. 

If BUFF is asserted while ADS is asserted, ADS remains Low 
(floats asserted). System logic must consider this when inter- 
preting the state of ADS after negating BUFF. In the next clock 
after BUFF is negated, the processor may reassert ADS to 
restart a cycle if a cycle was aborted by the assertion of BUFF. 

If system logic begins driving an inquire cycle by asserting 
AHULD or BUFF and then asserting EADS with the inquire 
address, and the processor is driving a Branch-Trace Message 
special bus cycle at the same time that AHULD or BUFF is 
asserted, the branch address information driven by the proces- 
sor on A31-A3 can be overwritten by the inquiring bus master. 
In such cases, system logic should latch A31-A3 when ADS is 
asserted, before asserting AHULD or BUFF. 

At the falling edge of RESET, the states of BKDYC and BUS- 
CHK control the drive strength on the A21-A3 (not including 
A31-A22), ADS, HITM, and W/R signals. The drive strength is 
weak for all states of BRDYC and BUSCHK except BKDYC and 
BUSCHK both Low (0), in which case the drive strength is 
strong. The A31-A22 signals use the weak drive strength at all 
times. See the data sheet for details. 
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5.2.4 ADSC (Address Strobe Copy) 

Output 

Summary ADSC is an identical copy of ADS. In systems that would other- 

wise place large capacitive loads on ADS, the ADSC output can 
be used instead of ADS to distribute loads, thereby increasing 
response time. 

Driven and Floated ADSC is driven and floated with the same timing as ADS. See 

the description of ADS on page 5-24. 

Details See the description of ADS on page 5-24. 
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5.2.5 

Summary 

Sampled 



Details 



AHOLD (Address Hold) 

Input 

System logic can assert AHOLD to obtain control of the bidi- 
rectional A31-A3 address bus and AP address parity signal to 
drive one or more inquire cycles to the processor. 

The processor samples AHOLD in every clock and responds by 
floating the bidirectional A31-A3 and AP signals one clock 
after AHOLD is asserted. 

AHOLD is sampled during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, inquire cycles, 
locked cycles, writebacks, special bus cycles, and interrupt 
acknowledge operations in the normal operating modes (Real, 
Protected, and Virtual-8086) and in SMM; in the Shutdown, 
Halt, or Stop Grant states; or while RESET, INIT or PRDY is 
asserted. AHOLD is sampled but not effective when BUFF or 
HLDA is asserted. AHOLD is not sampled during the Stop 
Clock state. 

The sole function of AHOLD is to support inquire cycles. There 
are three methods by which system logic can obtain control of 
the address bus to drive an inquire cycle: AHOLD, BUFF, or 
HULD. AHULD obtains control only of the address bus and 
allows another master or system logic to drive only inquire 
cycles, whereas BUFF and HULD obtain control of the full bus 
(address and data), allowing another master to drive not only 
inquire cycles but also read and write cycles. AHULD and 
HULD both permit an in-progress bus cycle to complete, but a 
writeback can occur while AHULD is asserted, whereas a pend- 
ing writeback during the assertion of BUFF or HULD occurs 
after the BUFF or HULD is negated. 

AHULD is useful primarily in systems with multiple buses and 
multiple bus masters, where operations can occur on the sepa- 
rate buses independently and in parallel. This configuration 
occurs, for example, if the processor shares a bus only with a 
look-through L2 cache, and other caching masters work in par- 
allel on another bus that is isolated from the processor by sys- 
tem logic. In such designs, system logic may drive separate 
AHULD signals to each bus master in the system. For details 
on how AHULD can be driven in such configurations, see Sec- 
tion 6.2.5 on page 6-14. 
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When the processor releases control of A31-A3 and AP in 
response to AHOLD, the processor still maintains control of 
the remaining signals on the bus so that it can (a) finish driving 
a bus cycle it may have begun before AHOLD was asserted, 
and (b) drive a writeback if an inquire cycle hits a modified 
line in the processor’s data cache. However, the processor can- 
not begin driving a new bus cycle while AHOLD is asserted 
because system logic controls the address bus. 

System logic drives inquire cycles with the EADS, A31-A5, AP 
and INV inputs. A typical sequence for an inquire cycle is: 
assert AHOLD; two clocks later, assert EADS and drive A31- 
A5 and INV; wait two clocks for the processor to assert HITM 
and/or HIT. If HUM remains negated two clocks after EADS is 
asserted, the inquire cycle ends. If HITM is asserted at that 
time, the processor begins driving a four-transfer burst write- 
back as early as two clocks after asserting HUM. 

AHOLD can be negated as early as one clock after EADS is 
asserted. If system logic holds AHOLD asserted throughout an 
inquire cycle and any required writeback, system logic must 
latch the inquire cycle address when it asserts EADS. This is 
required so that, if the inquire cycle hits a modified line 
(HUM asserted), the address used for the writeback need not 
be driven by the processor when the processor asserts ADS for 
the writeback. Instead, A31-A5 remains an input-only bus and 
system logic must use its latched copy of the inquire cycle 
address. By contrast, if system logic always negates AHOLD 
before the writeback, the processor drives the writeback 
address when it asserts ADS for the writeback, and system 
logic need not retain a copy of the inquire cycle address. While 
the processor drives the writeback address, it drives only the 
beginning address for the 32-byte transfer on A31-A5. System 
logic must determine the remaining addresses as shown in 
Table 5-4 on page 5-21. 

If system logic asserts AHOLD while the processor is driving a 
locked cycle, the system must not allow accesses by other bus 
masters to lock the same address that the processor is locking. 

While AHOLD is asserted (after the completion of any in- 
progress bus cycle by the processor), the processor continues 
to execute out of its instruction and data caches, if possible. If 
the processor can no longer operate out of its caches, it holds 
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BREQ asserted continuously. For a list of signals recognized 
while AHOLD is asserted, see Table 5-2 on page 5-8. 

The processor may again drive its own cycles with ADS as early 
as one clock after system logic negates AHOLD. Before negat- 
ing AHOLD, however, system logic may need to arbitrate 
among potential contenders for the address bus so as to avoid 
deadlock contention for the bus. 

Ground-bounce spikes can be avoided by following two rules 
with respect to AHOLD: 

■ Do not negate AHOLD in the same clock that BKDY is 
asserted during a write cycle. 

■ Do not negate AHOLD in the same clock that ADS is 
asserted during a writeback. 

These restrictions must be observed because the processor’s 32 
address drivers turn on almost immediately after AHOLD is 
negated. If the processor is driving data with BRDY on the 64- 
bit data bus at the same time, the processor then drives 96 bits 
simultaneously and ground-bounce spikes can occur. 
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5.2.6 AP (Address Parity) 

Bidirectional 

Summary AP carries the even parity bit for cache line addresses driven 

and sampled on A31-A5. The processor drives AP when it 
drives an address for a read or write cycle. The processor sam- 
ples AP during inquire cycles in order to drive the APCHK out- 
put. 

Driven i, Sampled, and AP is driven, sampled, and floated with the same timing as 

Floated A31-A3. See the description of A31-A3 on page 5-20. 

Details The bit value driven on AP is counted with the bit values 

driven on A31-A5 to determine address parity. If the total 
number of 1 bits is even on AP and A31-A5, the address is con- 
sidered free of error (thus the term even parity). If the total 
number of 1 bits is odd, the address is considered to have an 
error. The bit values driven on A4-A3 are not counted during 
the parity checking. 

In addition to generating and checking address parity, the pro- 
cessor also generates and checks data parity using the DP7- 
DPO and PCHK signals. See page 5-57 and 5-101 for details. 
Unlike the handling of PCHK, however, the processor does not 
capture the faulty address in a register when it asserts 
APCHK. System logic must handle the error externally. Typi- 
cal PC systems assert an interrupt signal such as NMI after a 
parity error is detected. 

Systems that do not implement address parity generation and 
checking should tie AP either High or Low and ignore the 
APCHK output. 
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5.2.7 

Summary 

Driven 

Details 



APCHK (Address Parity Check) 

Output 

The processor asserts APCHK if an even-parity error occurs on 
A31-A5 during an inquire cycle. 

The processor drives APCHK for one clock, two clocks after 
system logic asserts EADS with an inquire address. 

APCHK is driven under the same conditions in which EADS is 
sampled: See the description of EADS on page 5-58. 

System logic can use APCHK to initiate a remedy for the error. 
Typical PC systems assert an interrupt such as NMI if a parity 
error is detected. 

See the description of parity error determination for the AP 
input on page 5-31. Systems that do not implement address par- 
ity checking should ignore APCHK. 
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5.2.8 BE7 

Summary 

Driven and Floated 



Details 



BEO (Byte Enables) 

Output 

The eight bits of BE7-BE0, when cleared to 0, validate the 
eight bytes driven on D63-D0. In this way, BE7-BE0 expands 
on the function of address bits A2-A0, which do not exist on 
the A31-A3 address bus. BE7-BE0 also help differentiate the 
special bus cycles. 

The processor drives BE7-BE0 from the clock in which ADS is 
asserted until the last expected BRDY of the bus cycle. The 
processor floats BE7-BE0 one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

BE7-BE0 is driven with the address and cycle definition out- 
puts (D/C, M/IU and W/R) during memory cycles (including 
cache writethroughs and writebacks), I/O cycles, locked cycles, 
special bus cycles, and interrupt acknowledge operations in 
the normal operating modes (Real, Protected, and Virtual- 
8086) and in SMM, or while PRDY is asserted. While AHOLD is 
asserted, BE7-BE0 is driven only to complete a bus cycle that 
had been initiated before AHOLD was asserted, or for inquire 
cycle writebacks. During the Shutdown, Halt, and Stop Grant 
states, BE7-BE0 is driven only for inquire cycle writebacks. 
BE7-BE0 is not driven during the Stop Clock state, or while 
BUFF, HLDA, RESET, or INIT is asserted. 

Table 5-5 shows the relationship between BE7-BE0, D63-D0, 
DP7-DP0, and the effective relationship with A2-A0, the non- 
existent low address bits. The BE7-BE0 signals expand on the 
function of A2-A0; BE7-BE0 allow the processor to address 
any or all eight bytes indicated by A31-A3, whereas A2-A0, if 
they existed, would only address one of eight bytes. 

During single-transfer memory cycles and all I/O cycles, the 
processor drives BE7-BE0 to identify all of the bytes desired 
for the transfer. System logic must return valid data in those 
byte lanes of D63-D0. 

During burst reads (CACHE and KEN both asserted with the 
first BRDY of a memory read), the processor drives BF7-BE0 
with ADS to identify the bytes of the desired instruction or 
operand. The processor drives BE7-BE0 with the desired bytes 
at that time because it does not yet know whether the read will 
be a single-transfer or a burst — this depends on how system 
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logic drives KEN with the first BRDY. If system logic negates 
KEN, it must return as a single transfer only the bytes speci- 
fied on BE7-BE0. If system logic asserts KEN, it must ignore 
EE7-BE0 during all transfers of the burst and return all eight 
bytes for the starting address on A31-A3. BE7-BE0 does not 
change during the four transfers of the burst. (This behavior is 
unlike the 486 processor, which drives BE3-BE0 separately for 
each transfer of a burst.) System logic must determine the suc- 
cessive quadword addresses for each transfer in a burst, 
depending on the starting address, as shown in Section 5-4 on 
page 5-21. 

During single writes, which include cache writethroughs (1-to- 
8-byte transfers with CACHE negated) the processor drives the 
bits of BE7-BE0 to indicate which of the eight bytes on D63-D0 
are valid. During writebacks (32-byte, four-transfer bursts with 
CACHE asserted) the processor drives all bits of BE7-BE0 Low 
to indicate that all eight bytes on D63-D0 are valid. Write- 
backs are addressed by A31-A3 but they are always aligned to 
32-byte boundaries, so A4-A3 are always 0. 



Table 5-5. Relation Of BE7-BE0 To Other Signals 



Byte Enable 
Output 


Effective Address Bits 1 


Byte On Data 
Bus 


Data Parity Bit 


A2 


A1 


AO 


BE7 


1 


1 


1 


D63-D56 


D7 


BE6 


1 


1 


0 


D55-D48 


D6 


BE5 


1 


0 


1 


D47-D40 


D5 


BE4 


1 


0 


0 


D39-D32 


D4 


BE3 


0 


1 


1 


D31-D24 


D3 


BE2 


0 


1 


0 


D23-D16 


D2 


BET 


0 


0 


1 


D15-D8 


D1 


BEO 


0 


0 


0 


D7-D0 


DO 



Notes: 



1. BE7-BE0 expand on the function of A2-A0 by allowing the processor to address any or all 
eight bytes addressed by A3 1 -A3. 



The processor differentiates special bus cycles using a combi- 
nation of BE7-BE0, the cycle definition (D/C, M/IU, and W/R) 
outputs, and A31-A3. The values on the cycle definition signals 
are the same for all special cycles; only BE7-BE0 and A31-A3 
differentiate among those cycles. Table 5-6 shows the relation- 
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ships. This function of BE7-BEU bears no relationship to the 
D63-D0 data bus. This is particularly apparent in the case of 
the Branch-Trace Message special bus cycle, during which the 
value of BE7-BE0 is DFh (1101_llllb) but, in contradiction to 
the byte-enable bits, the four bytes on D31-D0 carry valid data 
during both cycles of the operation: during the first cycle, D31- 
DO carries the EIP value of the source (branch) instruction; 
during the second cycle, D31-D0 carries the EIP value of the 
branch-target instruction. 



Table 5-6. Encodings For Special Bus Cycles 



BE7-BE0 


A31-A3 


Special Bus Cycle 1 


Cause 


FEh 


...OOh 


Shutdown 


Triple fault 


FDh 


...OOh 


Cache Invalidation 


INVD instruction 


FBh 


...lOh 


Stop Grant 


STPCLK 


FBh 


...OOh 


Halt 


HLT instruction 


F7h 


...OOh 


Cache Writeback and Invalidation 


WBINVD instruction 


EFh 


...OOh 


FLUSH Acknowledge 


FLUSH 


DFh 


...OOh 


Branch-Trace Message 2 


Bit 5 = 1 and bits 3-1 = 001 in the Hard- 
ware Configuration Register (HWCR). See 
Section 7.1 on page 7-3 for details. 


Notes: 

1. For all special bus cycles, D/C = 0, M/ID = 0 and W/R = 1. System logic must return BRDY in response to this cycle. 

2. The message in a branch-trace message special bus cycle is different in the AMD-K5 and Pentium processors. 



Certain models of the Pentium processor implement BE7-BE5 
as outputs and BE4-BE0 as bidirectional signals. On the 
AMD-K5 processor, however, all eight BE7-BE0 signals are out- 
puts only. 
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5.2.9 

Summary 

Sampled 

Details 



BF (Bus Frequency) -(BF1 -BFO for Model 1) 

Input 

During RESET, BF (BF1-BF0) selects between a high and low 
multiplication factor for the frequency ratio between the pro- 
cessor’s internal clock and the bus clock (CFK). 

The processor samples BF (BF1-BF0) only on the falling edge 
of RESET. The signal assertion must be stable 10 clocks prior 
to its sampling. BF (BF1-BF0) has a weak internal pullup resis- 
tor; see the data sheet for details. 

Table 5-7 shows the ratios between the processor clock and the 
bus clock (CFK) for the High and Fow values of BF (BF1-BF0). 
BF (BF1-BF0) may be tied High or Fow. Due to the internal 
pullup resistor, the lower ratio is selected if BF (BF1-BF0) is 
left unconnected. 



Table 5-7. Processor-to-Bus Clock Ratios 



Processor Model 


State of BF Input(s) 


Processor-Clock to Bus-Clock Ratio 


0 


BF 


= 1 


1.5X 


BF 


= 0 


2. Ox 




BF1 = 1 


BFO = 1 


1.5X 


1 


BF1 = 1 


03 

-n 

o 

II 

o 


1.5X 


BF1 = 0 


03 

-n 

o 

II 


Reserved 




BF1 = 0 


03 

-n 

o 

II 

o 


Reserved 



Notes: 



1. The default processor-to-clock ratios are shown in Table 5-7. Specific models of the AMD-K5 
processor may implement different ratios for the High and Low values of BF. For authorative 
information, see the data sheet for each AMD-K5 processor model. 
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5.2.10 

Summary 

Sampled 



Details 



BOFF (Backoff) 

Input 

When system logic asserts BOFF, the processor floats the bus 
and continues to float it until BOFF is negated. If the processor 
is driving a bus cycle when BOFF is asserted, the cycle is 
aborted and restarted after BOFF is negated. The processor 
does not acknowledge BOFF. While BOFF is asserted, another 
bus master can drive cycles on the bus, including inquire 
cycles to the processor. 

The processor samples BOFF in every clock. When BOFF is 
asserted, the processor floats the cycle-driving outputs on the 
bus in the next clock and continues to float them until BOFF is 
negated. 

BOFF is sampled during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, inquire cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; in the Shutdown, Halt, or Stop 
Grant states; or while AHOLD, RESET, INIT, or PRDY is 
asserted. BOFF is sampled but not effective when HLDA is 
asserted. BOFF is not sampled during the Stop Clock state. 

The assertion of BOFF, like HOLD but unlike AHOLD, forces 
the processor to relinquish the full address and data bus to 
another bus master. The signal can be used for the following 
purposes: 

■ Bus Turnaround — Another bus master can assert BOFF to 
the processor to obtain control of the bus, allowing the 
other bus master to drive any type of bus cycles. 

■ Inquire Cycles — In multi-master systems with shared mem- 
ory, another bus master typically drives an inquire cycle to 
the processor or its L2 cache prior to driving a read or write 
cycle to any memory locations shared by both masters. Such 
inquire cycles can be driven while BOFF is asserted. 

■ Deadlock Resolution — When an inquire cycle by one master 
hits a modified cache line in another processor, neither mas- 
ter can proceed until the target of the inquire cycle gets the 
bus. In such a case, system logic would back the inquiring 
master off the bus by asserting BOFF to it, so that the mas- 
ter with the modified line can write it back to memory. 
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BUFF provides the fastest response of the three bus-hold 
inputs. Because of its ability to help resolve deadlock prob- 
lems, it is required in almost all systems with multiple-caching 
masters. In such designs, system logic typically drives separate 
BUFF signals to each bus master in the system. See Section 
6.2.5 on page 6-14 for system configurations using BUFF. 

Unlike AHULD and HULD, BUFF does not permit an in- 
progress bus cycle to complete. It forces the processor off the 
bus in the next clock, aborting any in-progress bus cycle that 
the processor has begun. A writeback can occur while AHULD 
is asserted, but a pending writeback during the assertion of 
BUFF or HULD waits until after BUFF or HULD is negated. 

The processor floats the bus one clock after the assertion of 
BUFF. All output and bidirectional signals used for memory or 
I/U accesses are floated. Table 5-8 shows the signals floated. 
The same set of signals is floated with HLDA. 



Table 5-8. Outputs Floated When BUFF is Asserted 



Address and 
Address Parity 


Cycle Definition 
and Control 


Data and 
Data Parity 


Cache 

Control 


A31-A3 


D/C 


D63-D0 


CACHE 


AD5 


LOCK 


DP7-DP0 


PCD 


ADSC 


M/ID 


N/A 


PWT 


AP 


SCYC 


N/A 


N/A 


BE7-BE0 


W/R 


N/A 


N/A 



The processor supports only one in-progress bus cycle, no 
pending bus cycles are buffered. If the processor is driving a 
bus cycle when BUFF is asserted the processor retains the data 
that had been transferred up to the clock in which BUFF was 
asserted but ignores the data transferred with or after BUFF 
was asserted. BUFF has no effect on writes to the processor 
store buffer, except to delay them. (The store buffer is situated 
between the execution units and the data cache. It is used for 
speculative stores prior to being written to the data cache.) 

The bus master asserting or causing the assertion of BUFF 
must wait two clocks after asserting BUFF before driving its 
first bus cycle because the processor does not float its outputs 
until one clock after the assertion of BUFF. System logic or 
another bus master may continue asserting BUFF for as long as 
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it wants. The processor has no way of breaking the hold. While 
the processor is backed off, it continues to execute out of its 
instruction and data caches, if possible. If it can no longer 
operate out of its caches, it holds BREQ asserted continuously. 

As early as one clock after BUFF is negated, the processor 
restarts — from the beginning — any bus cycle that was aborted 
when BUFF was asserted. This is unlike BUFF on the 486 pro- 
cessor, which restarts only the transfers that did not complete 
when BUFF was asserted. The processor can drive another 
cycle with ADS as early as two clocks after any aborted cycle 
completes. This allows one idle clock (also called a dead clock) 
between any two bus cycles. If BUFF was asserted when ADS 
was also asserted, however, ADS remains Low (floats asserted) 
after BUFF is negated. In such a case, system logic must prop- 
erly interpret the state of ADS when it negates BUFF. 

If BUFF is asserted during a locked operation, only the cycle(s) 
aborted before their last BRDY and the cycles not yet run are 
restarted after BUFF is negated. Thus, system logic must keep 
track of all cycles in the locked operation that have completed 
before the assertion of BUFF and must continue the locked 
operation immediately after BUFF is negated, except that if a 
writeback is pending when BUFF is negated, the writeback 
takes precedence over the restarting of the aborted cycles in 
the locked operation. 

The processor responds to inquire cycles while BUFF is 
asserted and drives HIT and HITM in response to such cycles. 
During the BUFF-initiated inquire cycles, BUFF can be 
negated as early as one clock after LADS is asserted. If HUM 
is asserted, which would occur two clocks after EADS is 
asserted, the writeback is performed after BULL is negated. If 
a processor cycle was aborted by the assertion of BUFF, that 
cycle is restarted as soon as BUFF is negated, except that if an 
inquire cycle hits a modified line while BUFF was asserted, the 
writeback is driven first when BUFF is negated, before an 
aborted cycle is restarted. Multiple inquire cycles are not per- 
mitted to hit modified lines. The processor implements this 
restriction by ignoring EADS while HITM is asserted; when 
HITM is asserted, it is held asserted until the last BRDY of the 
writeback. 
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If BUFF is asserted when BUSCHK is asserted, BUFF is recog- 
nized and BUSCHK is ignored. For a list of signals recognized 
while BUFF is asserted, see Table 5-2 on page 5-8. 
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5.2.11 

Summary 

Sampled 



Details 



BRDY (Burst Ready) 

Input 

For bus cycles that transfer data, system logic must assert 
BRDY to indicate that it has received a data transfer on D63- 
DO during a write and to indicate that it has placed valid data 
on D63-D0 during a read. Up to eight bytes of data — the width 
of the D63-D0 data bus — are validated with each BRDY. For 
special bus cycles, system logic must assert BRDY either to val- 
idate data or as a simple handshake. 

The processor samples BRDY every clock, from one clock after 
ADS until the last expected BRDY of the bus cycle. 

BRDY is sampled during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, locked cycles, spe- 
cial bus cycles, and interrupt acknowledge operations in the 
normal operating modes (Real, Protected, and Virtual-8086) 
and in SMM, or while PRDY is asserted. While AHOLD is 
asserted, BRDY is sampled only to complete a bus cycle that 
had been initiated before AHOLD was asserted, or for inquire 
cycle writebacks. During the Shutdown, Halt, and Stop Grant 
states, BRDY is sampled only for inquire cycle writebacks. 
BRDY is not sampled when the processor is not driving an 
external bus cycle; or during the Stop Clock state; or while 
BUFF, HLDA, RESET, or INIT is asserted. 

If BRDY is asserted simultaneously with BUFF, BUFF is recog- 
nized and BRDY is not, but if BRDY is asserted simultaneously 
with HULD, BRDY is recognized and the HULD waits until the 
bus cycle associated with the BRDY completes. 

BRDY is associated with a transfer of one to eight bytes on the 
D63-D0 data bus. During memory and I/U reads, the processor 
samples and latches the bytes on D63-D0 and the parity bits on 
DP7-DP0 that are enabled by BE7-BE0 when system logic 
asserts BRDY. During memory and I/U writes, the processor 
waits for system logic to return BRDY before transferring 
more data on D63-D0 or before starting another bus cycle. 
Delays in returning the BRDY for a transfer (and delays in 
returning EWBE for a write cycle) are said to add wait states to 
the transfer, although these states are nothing more than the 
absence of an expected BRDY. 
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The processor samples BRUY during all types of bus cycles, 
including the following: 

■ Single-transfer reads 

■ Single-transfer writes (including cache writethroughs) 

■ Burst reads (cache line fills) 

■ Burst writebacks 

■ Special bus cycles 

■ Interrupt acknowledge cycles 

The number of BRDYs expected by the processor depends on 
the type of bus cycle, as follows: 

■ One BRDY for an aligned single-transfer cycle, a special 
bus cycle, or each of two cycles in an interrupt acknowledge 
operation. Additional BRDYs are needed for misaligned 
cycles. 

■ Four BRDYs, one for each data transfer in a burst cycle. 
BRUY may be held asserted throughout the four transfers 
of the burst. 

All data transfers that are not performed as bursts are per- 
formed as one or more single-transfer cycles. For write cycles, 
EWBE must be asserted either with or after BRDY in order for 
any further writes or certain other operations to be performed 
(see the description of EWBE on page 5-62). If system logic 
returns more BRDYs than the processor expects for a single- 
transfer cycle or a burst cycle, the processor ignores them. 

The processor samples the following inputs in the clock in 
which system logic asserts BRDY: 

■ D63-D0 — Every BRDY, for all bus cycles. 

■ DP7-DP0 — Every BRDY, for all bus cycles. 

■ BUSCHK — Every BRDY, for all bus cycles. 

■ EWBE — Every BRDY, for write cycles. 

■ KEN — First BRDY or NA, whichever occurs first, for read 
cycles. 

■ PEN — Every BRDY for read cycles, and second BRDY of 
interrupt acknowledge operations. 

■ WB/WT — First BRDY or NA, whichever occurs first, for 
read and write cycles. 
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The assertion of NA acts as an assertion of BRDY only when 
the processor samples KEN or WB/WT. 

The processor drives or asserts the following outputs relative 
to the assertion of BRDY: 

■ D63-D0 — For single-transfer write cycles, the processor 
drives data from one clock after ADS until BRDY is 
returned. For burst transfers, the processor drives data 
from one clock after ADS until the first BRDY is returned, 
and thereafter from each BRDY until the next BRDY. 

■ DP7-DP0 — Same as D63-D0. 

■ PCHK — Two clocks after every BRDY for writes. 

In addition to the above uses of BRDY on the 486 processor, 
BRDY on the AMD-K5 and Pentium processors is used for both 
single-transfer and burst cycles, and it terminates special bus 
cycles. 

Unlike BRDY on the 486 processor, BRDY on the AMD-K5 and 
Pentium processors is used for both single-transfer and burst 
cycles, and it terminates special bus cycles. On the 486 proces- 
sor, single-transfer cycles and special bus cycles use RDY; 
BRDY is used only for burst cycles. The BLAST output on the 
486 processor is not implemented on the AMD-K5 and Pentium 
processors, which instead use the CACHE output to indicate 
cacheability. However, unlike the 486 processor, which can ter- 
minate a burst cycle prematurely by negating BLAST, the 
AMD-K5 and Pentium processors cannot terminate a burst pre- 
maturely. 
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5.2.12 

Summary 

Sampled 

Details 



BRDYC (Burst Ready) 

Input 

BRDYC is an identical copy of BRDY, except that BRDYC has 
an internal pullup resistor whereas BRDY does not. In systems 
that would otherwise place large capacitive loads on BRDY, 
the BRDYC output can be used in place of BRDY to distribute 
loads, thereby increasing response times. 

BRDYC is sampled with the same timing as BRDY. See the 
description of BRDY on page 5-41. 

See the description of BRDY on page 5-41. Unlike BRDY, 
BRDYC has an internal pullup resistor. 

At the falling edge of RESET, the states of BRDYC and BUS- 
CHK control the drive strength on the A21-A3 (not including 
A31-A22), ADS, HITM, and W/R signals. The drive strength is 
weak for all states of BRDYC and BUSCHK except when 
BRDYC and BUSCHK are both Low, in which case the drive 
strength is strong. The A31-A22 signals use the weak drive 
strength at all times. See the data sheet for details. 
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5 . 2.13 

Summary 

Driven 



Details 



BREQ (Bus Request) 

Output 

The processor asserts BREQ to indicate that it is either driving 
a cycle on the bus, performing certain types of cache accesses, 
or needs access to the bus in order to continue operating. 

The processor asserts BREQ on the first clock of every proces- 
sor-initiated bus cycle, with ADS, and in the first clock of every 
cache store and cache-tag recovery. The processor asserts 
BREQ continuously while it being held off the bus and can no 
longer operate out of its cache. 

BREQ is driven during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, locked cycles, spe- 
cial bus cycles, and interrupt acknowledge operations in the 
normal operating modes (Real, Protected, and Virtual-8086) 
and in SMM; or while AHOLD, BUFF, HLDA, or PRDY is 
asserted. BREQ is not driven in the Shutdown, Halt, Stop 
Grant, or Stop Clock states; or while RESET or INIT is 
asserted. 

The processor observes a bus-parking protocol. It continues to 
drive the bus without an arbitration sequence in the absence of 
AHOLD, BUFF or HULD. System logic can use the assertion of 
BREQ to arbitrate bus access among competing bus masters. If 
the processor asserts BREQ only on the first clock of a cache 
access or bus cycle, system logic need not take action, whether 
or not the processor is being held off the bus. If the processor 
can no longer operate out of cache, it holds BREQ asserted 
until system logic negates the signal that is holding it off the 
bus (AHULD, BUFF, or HULD). Une clock after the negation of 
that signal, the processor drives a bus cycle with ADS. 
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5.2.14 

Summary 

Sampled 



BUSCHK (Bus Check) 

Input 

System logic can assert BUSCHK if it determines that the cur- 
rent bus cycle has or will have any type of error. In response, 
the processor stores information about the aborted bus cycle 
and (optionally) generates a machine check exception. If 
machine check exceptions are not enabled, the processor 
attempts to continue execution after the assertion of BUSCHK. 
The signal is also used to set the drive strength of the A21-A3, 
ADS, HITM, and W/R signals at RESET. 

The processor samples BUSCHK with every BRDY, including 
the BRDYs of writeback cycles, and recognizes it at the next 
instruction boundary. BUSCHK is a level-sensitive interrupt 
with an internal pullup resistor. However, unlike other level- 
sensitive interrupts, BUSCHK is sampled with every BRDY 
and is not acknowledged. 

BUSCHK is sampled during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, locked cycles, spe- 
cial bus cycles, and interrupt acknowledge operations in the 
normal operating modes (Real, Protected, and Virtual-8086) 
and in SMM; or in the Shutdown, Halt, or Stop Grant states. 
While AHOLD is asserted, the processor samples BUSCHK 
only to complete a bus cycle that had been initiated before 
AHOLD was asserted, or during writebacks that result from 
inquire cycle hits. BUSCHK is not sampled when the processor 
is not driving an external bus cycle; or during the Stop Clock 
state; or while BUFF, HLDA, RESET, INIT, or PRDY is 
asserted. 

At the falling edge of RESET, the states of BRDYC and BUS- 
CHK control the drive strength on the A21-A3 (not including 
A31-A22), ADS, HITM, and W/R signals. The drive strength is 
weak for all states of BRDYC and BUSCHK except BRDYC and 
BUSCHK both Low, in which case drive strength is strong. 
A31-A22 use the weak drive strength at all times. See the data 
sheet for details. 

BUSCHK is the highest-priori ty external interrupt. For details 
on its relationship to other interrupts and exceptions, see Sec- 
tion 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 
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Details 



Bus cycle errors such as parity can be reported to the processor 
on BUSCHK if this reporting is not done on NMI. The BUSCHK 
signal is not used in most PC systems, although higher-end sys- 
tems may find uses for it in special situations. 

Upon recognizing a BUSCHK interrupt at the instruction 
boundary, the processor performs the following actions, in the 
order shown: 

1. Latch Cycle Information — The processor latches the physi- 
cal address and cycle definition of the failed bus cycle in its 
64-bit machine check address register (MCAR) and 64-bit 
machine check type register (MCTR). These registers can 
be read during a service routine with the RDMSR instruc- 
tion (ECX = 0 for the MCTR, ECX = 1 for the MCTR). See 
Section 3.3.5 on page 3-33 for details on this instruction. 

2. Machine Check Exception (Optional ) — If system software has 
set the MCE bit in CR4 to 1, the processor waits for the last 
BRDY of the failed bus cycle, then invalidates all instruc- 
tions remaining in the pipeline, saves its state, and gener- 
ates a machine check exception (12h). 

If the MCE bit is cleared to 0, the processor continues exe- 
cution with the next instruction. 

After asserting BUSCHK, system logic must nevertheless 
return all BRDYs that the processor expects for the type of bus 
cycle that experienced the error: one BRDY for single-transfer 
cycles; four BRDYs for burst cycles. If BUSCHK is asserted 
during a locked operation or inquire cycle, an enabled 
machine check exception will not be acted upon until after the 
last BRDY of the locked operation or after a writeback caused 
by an inquire cycle. If BUSCHK is asserted during the Halt or 
Stop Grant state, the signal is sampled with BRDY but held 
pending until after the processor exits the Halt or Stop Grant 
state, at which point an enabled machine check exception will 
be acted upon. 

If BUFF is asserted when BUSCHK is asserted, BUFF is recog- 
nized and BUSCHK is ignored. The processor does not recog- 
nize BUFF or HULD while BUSCHK is asserted, but it does 
recognize AHULD if that signal is asserted for the cycle caus- 
ing the bus check. The processor latches the assertion of any 
edge-triggered interrupt (FLUSH, SMI, INIT, NMI) while 



Signal Descriptions 



5-47 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



BUSCHK is asserted and recognizes latched interrupts in prior- 
ity order when BUSCHK is negated. 

The MCE bit in CR4, which enables machine check exceptions 
during BUSCHK, also enables machine check exceptions dur- 
ing data parity errors that are indicated on PCHK while PEN is 
asserted. 
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5.2. 1 5 CACHE (Cacheable Access) 

Output 

Summary The processor drives CACHE to specify that the current bus 

cycle is a burst cycle. If CACHE is asserted for a read cycle, the 
cycle is a four-transfer burst and fills a cache line. If CACHE is 
asserted for a write cycle, the cycle is a four-transfer burst 
writeback of a modified cache line. CACHE is not asserted for 
writethroughs, so the signal is not asserted for all cycles involv- 
ing cacheable locations. 

Driven and Floated The processor drives CACHE from ADS until the last expected 

BRDY of the bus cycle. 

CACHE is driven during memory cycles, I/O cycles, locked 
cycles, special bus cycles, and interrupt acknowledge opera- 
tions in the normal operating modes (Real, Protected, and Vir- 
tual-8086) and in SMM. CACHE is not driven in the Shutdown, 
Halt, or Stop Grant states, except for writebacks due to inquire 
cycles, and CACHE is never driven during the Stop Clock state 
or while BUFF, HLDA, RESET, INIT, or PRDY is asserted. 

The processor floats CACHE one clock after system logic 
asserts BUFF and in the same clock that the processor asserts 
HLDA. 

Details The processor asserts CACHE for certain types of unlocked 

memory reads, as specified by the operating system, and for all 
writebacks (writes of lines cached in the M state). The asser- 
tion of CACHE indicates the processor’s intent to drive the 
read or write cycle as a 32-byte burst and, in the case of read 
cycles, to cache the data or instructions. During reads, system 
logic can use the assertion of CACHE to initiate a table lookup 
of cacheable addresses. To enable caching in the processor’s 
instruction or data cache, system logic must assert KEN during 
the first BRDY or NA of the bus cycle, whichever comes first. If 
either CACHE or KEN is negated when KEN is sampled, the 
processor performs a non-cacheable, single-transfer read. 

The only type of write cycle for which the processor asserts 
CACHE are 32-byte writebacks of modified data. Writebacks 
can be caused by (a) externally initiated inquire cycles or 
FLUSH operations, (b) processor-initiated internal snoops or 
cache line replacements, or (c) program-initiated WBINVD 
instructions. By contrast, the processor drives writethroughs 
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during write hits to shared cache lines and during write misses, 
but writethroughs are driven as single transfers of 1 to 8 bytes. 
CACHE is not asserted during writethroughs. 

CACHE is partially determined by the PCD bit maintained by 
the operating system (in Protected mode, for example, the 
PCD bit is maintained in the page directory and page table 
entries for the accessed page). This is the bit that fully deter- 
mines the processor’s page cache disable (PCD) output. PCD 
indicates a non-cacheable page. Thus, the states of CACHE and 
PCD are very often the same. CACHE is never asserted when 
PCD is asserted. PCD indicates the cacheability of an entire 
page, and CACHE indicates the burstability of a particular bus 
cycle; burstability is a necessary but insufficient condition for 
determining cacheability. The cacheability of a particular bus 
cycle is determined during read cycles when system logic 
asserts KEN while the processor asserts CACHE. KEN is not a 
factor in determining the state of the PCD or CACHE signals. 
The processor drives both PCD and CACHE before it knows 
the state of KEN. For details, see the descriptions of KEN and 
PCD on pages 5-89 and 5-99. 

The ME SI state of a cache line is determined at the time of the 
line-fill by the states of the CACHE, KEN, PWT and WB/WT 
signals. Table 5-9 shows the relationship between these signals 
and the data cache ME SI states during reads. Read misses with 
CACHE or KEN negated are non-cacheable and are driven as 
single-transfer cycles on the bus. Read misses with both 
CACHE and KEN asserted in the first transfer of the bus cycle 
are cacheable, are driven as burst cycles on the bus, and have 
their resulting ME SI state determined by PWT and WB/WT. 
Read hits have their resulting ME SI state determined entirely 
by their prior ME SI state. 

For data cache ME SI state transitions during writes, see the 
description of the WB/WT signal on page 5-133. For more 
details on data-cache ME SI state transitions and control, and 
the correspondence between MESI states and writeback or 
writethrough states, see Section 5.2.56 on page 5-133 and Sec- 
tion 6.2 on page 6-8. 
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CACHE is not asserted for the following types of memory reads 
(M/IU = 1): 

■ Locked reads (that is, while LOCK is asserted) 

■ TLB reads 

■ Any read with PCD asserted (PCD is a factor in determining 
the state of CACHE) 



Table 5-9. MESI-State Transitions for Reads 



Signal or Event 


Result of Cache Lookup 


Read Miss 


Read Hit 


shared 


exclusive 


modified 


CACHE, PCD 1 


1 


- 


0 


0 


0 


- 


- 


- 


KEN 


- 


1 


0 


0 


0 


- 


- 


- 


PWT 


- 


- 


1 


- 


0 


- 


- 


- 


m/m 


- 


- 


- 


0 


1 


- 


- 


- 


Cache Line Fill 
(32 bytes) 


no 


no 


yes 


yes 


yes 


no 


no 


no 


State After Read 2 


- 


- 


shared 


shared 


exclusive 


shared 


exclusive 


modified 



Notes: 

- Don 't care or not applicable. 



/. The PCD bit is one determinant of the state of CACHE. 

2. Transition occurs after any line fill. Lines in "shared" MESI state are said to be in "writethrough" state. Those in "exclusive" or "mod- 
ified" MESI states are said to be in "writeback" state. 



On the 486 processor, by comparison, the CACHE output does 
not exist, but the BLAST output (in conjunction with KEN) 
serves to determine cacheability. Although bursts are typically 
four 32-bit transfers on the 486 processor, they can be longer 
with narrower-width memories. 



Signal Descriptions 



5-51 







AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



5.2.16 

Summary 

Sampled 

Details 



CLK (Bus Clock) 

Input 

CLK, in conjunction with the state of BF at RESET, determines 
the frequency of the processor’s internal clock. 

The processor always samples CLK. The clock must have 
begun oscillating prior to the assertion of RESET during 
power-up. 

All processor signals are driven and sampled relative to the ris- 
ing edge of CLK, except the edge-triggered interrupts FLUSH 
and SMI, which are sampled on the falling edge of CLK. 

The processor’s internal clock runs at a multiple of CLK that is 
determined by the state of the BF input(s) during RESET. A 
digital phase-locked loop generates the internal clock from 
CLK. 

Power consumption can be reduced to its minimum when sys- 
tem logic turns CLK off. The processor enters its Stop Clock 
state when system logic asserts STPCLK (thus entering the 
Stop Grant state) and subsequently turns CLK off (thus enter- 
ing the Stop Clock state). In the Stop Clock state, the proces- 
sor’s phase-lock loop and I/O buffers are disabled, except for 
the I/O buffers on CLK and the TAP signals. While the proces- 
sor is in the Stop Clock state, system logic should not change 
the state of any signals other than CLK without first restarting 
CLK. When CLK is restarted, the processor returns to the Stop 
Grant state and responds to inputs in the next clock, but can- 
not drive bus cycles until its phase-lock loop is synchronized. 
The latter takes several clocks (see the data sheet for this spec- 
ification). For details on STPCLK and the Stop Clock state, see 
page 5-122. 

While the processor operates with the Test Access Port (TAP), 
all TAP events are timed relative to TCK rather than to CLK. 
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5.2. 1 7 D/C (Data or Code) 

Output 

Summary The processor drives D/U to indicate whether it is accessing 

data or executable code on the bus. The signal is driven at the 
same time as the other two cycle definition signals: M/IU and 
W/R. A specific encoding of D/C, M/IU, and W/R identifies one 
of several special bus cycles. 

Driven and Floated The processor drives D/C from ADS until the last expected 

BRDY of the bus cycle. 

D/C is driven with the other cycle definition outputs (M/IU and 
W/R) and with the BE7-BE0 byte-enable outputs during mem- 
ory cycles (including cache writethroughs and writebacks), I/U 
cycles, locked cycles, special bus cycles, and interrupt 
acknowledge operations in the normal operating modes (Real, 
Protected, and Virtual-8086) and in SMM, or while PRDY is 
asserted. While AHULD is asserted, D/C is driven only to com- 
plete a bus cycle that had been initiated before AHULD was 
asserted, or for inquire cycle writebacks. During the Shut- 
down, Halt, and Stop Grant states, D/C is driven only for 
inquire cycle writebacks. D/C is not driven during the Stop 
Clock state, or while BUFF, HLDA, RESET, or INIT is asserted. 

The processor floats D/C one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

Details The processor drives D/C according to whether the access is 

initiated by the processor’s prefetch or branch logic (indicating 
a code access) or its load/store logic (indicating a data access). 
In the AMD-K5 processor, code accesses can be done specula- 
tively, but data accesses are not. Unly data (not code) can be 
read from the I/U address space, because the cycle definition 
for an I/U code read (D/U = 0, M/IU = 0, W/R = 0) defines an 
interrupt acknowledge cycle. 

Before the processor fetches an instruction or reads or writes a 
data operand, it checks the descriptor for the segment contain- 
ing the code or data to verify that such action is allowed. The 
execute (E) bit in the segment descriptor distinguishes 
between data and code segments. A general-protection excep- 
tion is generated if the E bit does not match the D/U type. 
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During special bus cycles, the processor drives D/C = 0, M/IU = 
0, and W/R = 1. The cycles are then differentiated by BE7-BE0 
and A31-A3. 
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5.2.18 D63-D0 (Data Bus) 

Bidirectional 

Summary The processor drives and samples up to eight bytes on D63-D0 

during memory or I/O accesses. System logic must decode the 
source and destination of these transfers using the address bus 
and various control signals. 

As Outputs: For single-transfer writes (including cache 
writethroughs), the processor drives D63-D0 valid from one 
clock after ADS until BRDY. For writebacks (the only type of 
burst write), the processor drives D63-D0 valid from one clock 
after ADS until the first BRDY, and thereafter from one clock 
after each BRDY until the next BRDY of the bus cycle. 

The processor floats D63-D0 one clock after system logic 
asserts BUFF in the clock that the processor asserts HLDA. 

As Inputs: While BUFF or HLDA is asserted, the processor sam- 
ples D63-D0 with every BRDY of the bus cycle. 

D63-D0 is driven or sampled during memory cycles (including 
cache writethroughs and writebacks), I/U cycles, locked cycles, 
special bus cycles, and interrupt acknowledge operations in 
the normal operating modes (Real, Protected, and Virtual- 
8086) and in SMM, or while PRDY is asserted. While AHULD is 
asserted, D63-D0 is driven or sampled only to complete a bus 
cycle that had been initiated before AHULD was asserted, or 
for inquire cycle writebacks. During the Shutdown, Halt, and 
Stop Grant states, D63-D0 is driven only for inquire cycle 
writebacks. D63-D0 is not driven or sampled during the Stop 
Clock state, or while BUFF, HLDA, RESET, or INIT is asserted. 

Details Data is transferred between the processor and memory or I/U 

on up to eight bytes of the D63-D0 data bus. The BE7-BE0 
byte-enable signals specify the validity of each byte on D63- 
D0. Table 5-10 shows the relation between D63-D0 and BE 7- 
BEO. System logic must interpret BE7-BE0 for data byte vali- 
dation during single-transfer memory reads and writes and for 
all I/U reads and writes. However, for burst reads (cache line 
fills) and writes (cache writebacks) — that is, when the proces- 
sor asserts CACHE — the processor expects data to be valid 
and will drive valid data on all eight bytes of the data bus with- 
out regard to the state of BE7-BE0. 
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Table 5-10. Relation Between D63-D0, BE7-BE0, and DP7-DP0 



Byte On Data Bus 


Byte Enable Output 


Data Parity Bits 


D63-D56 


BET 


DP7 


D55-D48 


BE6 


DP6 


D47-D40 


BE5 


DP5 


D39-D32 


BE4 


DP4 


D31-D24 


BE3 


DP3 


D23-D16 


BE2 


DP2 


D15-D8 


BET 


DPI 


D7-D0 


BEO 


DPO 



During burst reads the processor drives BE7-BE0 to identify 
only the byte address of the next desired operand. The byte 
indication does not change throughout the burst; it continues 
to be driven on BE7-BE0 during all four transfers. The memory 
subsystem must ignore BE7-BE0 during the second, third, and 
fourth transfers of a burst and return all eight bytes corre- 
sponding to the eight-byte address on A31-A3. Furthermore, 
the memory subsystem must determine the successive 
addresses, depending on the starting address that the proces- 
sor drives on A31-A3, as described in Table 5-4 on page 5-21. 

During writebacks the processor drives all bits of BE7-BE0 
Low to indicate that all eight bytes on D63-D0 are valid. Write- 
backs are addressed by A31-A3, but they are always aligned to 
32-byte boundaries so that A4-A3 are always 0. 

If memory reads, memory writes, or I/O reads are misaligned, 
the Pentium processor transfers the highest-addressed portion 
followed by the lowest-addressed portion. The AMD-K5 proces- 
sor runs such cycles in the opposite order from the Pentium 
processor. I/O writes, however, are performed in the same 
order on both processors. 
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5.2.19 DP7 

Summary 



Driven i. Sampled \ and 
Floated 

Details 



DPO (Data Parity) 

Bidirectional 

DP7-DP0 carry the even-parity bits for each byte driven and 
sampled on the D63-D0 data bus. While DP7-DP0 are outputs, 
system logic can use the signals to check parity. While DP 7- 
DPO are inputs, the processor uses them to determine the state 
of the PCHK output. 

DP7-DP0 are driven, sampled, and floated with the same tim- 
ing as D63-D0. See the description for D63-D0 on page 5-55. 

DP7 corresponds to the high byte on the data bus (D63-D56) 
and DPO corresponds to the low byte on the data bus (D7-D0). 
To determine data parity, the bit values driven for each byte 
on DP7-DP0 are considered with the bit values driven for each 
byte on D63-D0. For example, if the total number of 1 bits for 
the byte on D63-D56 is even for DP7 and D63-D56, the address 
is considered free of error (thus the term even parity). If the 
number of 1 bits is odd, the byte is considered to have an error. 

During single-transfer read cycles, parity is only checked for 
enabled bytes as specified by BE7-BE0. During burst reads, 
parity is checked for all eight bytes, regardless of BE7-BE0. If 
a parity error is detected on a read, the processor asserts 
PCHK. 

Systems that do not implement data parity generation and 
checking should tie DP7-DP0 either High or Low and ignore 
the PCHK output. In addition to generating and checking data 
parity, the processor also generates and checks address parity 
using the AP and APCHK signals. See page 5-31 and 5-32 for 
details. 
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5.2.20 

Summary 

Sampled 



Details 



EADS (External Address Strobe) 

Input 

While system logic holds the processor off the address bus, sys- 
tem logic can assert EADS and drive a cache line address to 
initiate an inquire cycle. Inquire cycles cause the processor to 
snoop its internal caches. 

The processor samples EADS every clock, beginning two clocks 
after the assertion of AHOLD or BUFF, or one clock after the 
assertion of HLDA; except while the processor drives A31-A3, 
while it asserts HITM, and one clock after EADS. 

While AHOLD is asserted, EADS is sampled while the proces- 
sor finishes an in-progress memory cycle (including a cache 
writethrough or writeback), I/O cycle, locked cycle, special bus 
cycle, or interrupt acknowledge operation in the normal oper- 
ating modes (Real, Protected, and Virtual-8086) and in SMM. 
While AHOLD, BOFF, or HLDA is asserted, EADS is always 
sampled while the processor operates out of its cache or is idle; 
or is in the Shutdown, Halt, or Stop Grant state; or while INIT 
or PRDY is asserted. EADS is not sampled in the Stop Clock 
state or while RESET is asserted. 

If BUFF and EADS are both asserted in the same clock that 
AHOLD is negated, EADS is not recognized. If EADS is 
asserted on the same clock that HOLD is negated, both the 
AMD-K5 and the Pentium processors recognize this as a valid 
inquire cycle and process it correctly. However, if EADS is 
asserted on the clock following the negation of HOLD, the 
AMD-K5 processor does not recognize this as a valid inquire 
cycle. 

Inquire cycles cause the processor to compare a physical 
address driven by system logic with the processor’s physical 
address tags for its instruction and data caches. Inquire cycles 
can occur in parallel with the processor’s own cache accesses, 
which are done through a separate set of linear address tags. 

Inquire cycles are sometimes called snoop cycles, although the 
term snoop means at least three different things: (a) external 
snoop cycles that are occasionally driven on the bus by system 
logic, such as an inquire cycle, (b) internal snoops that are 
done automatically whenever the processor accesses its cache, 
such as when the processor compares the address of a write to 
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its data cache with the addresses in the instruction cache, and 
(c) automatic bus watching, in which a caching device con- 
stantly compares addresses being driven by any other device 
on the address bus with its own cached addresses. The AMD-K5 
and Pentium processors only support the first two types of 
snooping, not the third. 

There are three methods by which system logic can obtain con- 
trol of the address bus prior to running one or more inquire 
cycles: AHOLD, BUFF or HOLD. While it has control of at least 
the address bus, system logic can drive inquire cycles using 
EADS, A31-A5, INV, and (optionally) AP. 

The system logic’s sequence for driving inquire cycles is as fol- 
lows: 

1. Assert AHOLD, BOFF, or HOLD to obtain at least the 
address bus. 

2. Assert EADS two clocks after asserting AHOLD or BOFF, or 
one clock after the processor asserts HLDA, and simulta- 
neously drive INV and a cache-line address on A31-A5. The 
processor latches the address on A31-A5 when EADS is 
asserted. 

3. Wait two clocks, watching for HITM and/or HIT to be 
asserted: 

• If neither HIT nor HITM are asserted at the end of two 
clocks, or if only HIT is asserted, the inquire cycle termi- 
nates. EADS can be asserted again in the same clock 
that HU M is negated. 

• If HUM is asserted, a writeback follows and the proces- 
sor does not recognize EADS again until one clock after 
the last BRDY of the writeback. The timing of the write- 
back depends on whether AHOLD, BOFF or HOLD was 
asserted to gain access to the bus: if AHOLD was used, 
the processor begins driving the four-transfer burst 
writeback as early as two clocks after asserting HUM, 
whether or not AHOLD is still asserted. If BOFF or 
HOLD was used, the processor delays the writeback un- 
til after BOFF or HLDA is negated. 

To prevent multiple inquire cycles from hitting modified lines, 
and causing a backlog of writebacks, the processor does not 
recognize another EADS while HITM is asserted. HITM is 



Signal Descriptions 



5-59 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



negated one clock after the last BRDY of the writeback, at 
which time another EADS can be asserted. 

If AHOLD is held asserted throughout an inquire cycle, system 
logic must latch the inquire cycle address when EADS is 
asserted. This is required so that, if the inquire cycle hits a 
modified line, the address used for the writeback need not be 
driven by the processor when the processor asserts ADS for the 
writeback; instead, system logic must use its latched copy of 
the inquire cycle address. By contrast, if system logic always 
negates AHOLD before the writeback, the processor will drive 
the writeback address when it asserts ADS for the writeback, 
and system logic need not latch a copy of the inquire cycle 
address. 

If EADS is asserted in the same clock that HOLD is negated, 
the processor recognizes this as a valid inquire cycle. However, 
if EADS is asserted in the clock following the negation of 
HOLD, the processor does not recognize this as a valid inquire 
cycle. 

Inquire cycles can be implemented for every memory access by 
another caching master. To do this, system logic can generate 
EADS to the processor using the equivalent of ADS from the 
other caching master. 

An inquire cycle can hit a line that is in the process of being 
written back for a reason other than the inquire, such as when 
the writeback is being done to make room in the cache for a 
new line (called a replacement writeback) or when the 
WBINVD (writeback and invalidate) instruction is being exe- 
cuted. If this occurs, the in-progress writeback completes but 
the system must recognize that this writeback was for the same 
line that was the subject of the inquire cycle. The processor 
will not repeat the writeback, but it will assert H1TM. 

If an inquire cycle occurs during a Branch-Trace Message spe- 
cial cycle, the branch-address information driven by the pro- 
cessor on A31-A3 can be overwritten by the inquiring bus 
master. In such cases, system logic should latch A31-A3 when 
ADS is asserted (that is, before asserting AHOLD, BUFF or 
HOLD). 

EADS should not be asserted at the same time the processor is 
running a BIST (INIT asserted on the falling edge of RESET) or 
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while the TAP instruction, RUNBIST, is executed. The proces- 
sor accesses the physical tag array during both BISTs and 
inquire cycles via EADS, and these accesses can conflict. 

The 486 processor without writeback cache samples EADS in 
every clock, including while the processor drives the address 
bus. It can thus support inquire cycles every clock. The 
AMD-K5 and Pentium processors, by comparison, can sample 
EADS every other clock, and the maximum inquire or invalida- 
tion rate with inquire cycles is one every two clocks, because 
HIT and H1TM change state two clocks after EADS, and EADS 
can be asserted in the same clock in which HITM is negated. 
The AMD-K5 processor does not sample EADS in the clock 
after a valid EADS assertion. 
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5.2.21 

Summary 

Sampled 

Details 



EWBE (External Write Buffer Empty) 

Input 

The processor delays cache writes and certain serializing 
instructions if system logic negates EWBE during external 
writes. 

The processor samples EWBE with the BRDY of external write 
cycles and in every clock thereafter until EWBE is asserted. 

All writes on the AMD-K5 processor — whether to cache, mem- 
ory, or I/O — are performed in program order, regardless of the 
state of EWBE. The only effect of EWBE on writes is to hold off 
additional writes when the signal is negated. 

The processor expects EWBE to be asserted with or after the 
last BRDY of each write cycle. Thus for writebacks, the proces- 
sor expects EWBE to be asserted with or after the BRDY of the 
fourth transfer. System logic should assert EWBE when all 
external write buffers are empty, thus indicating that the write 
to memory or I/O has completed and that writes to the cache 
can take place. Most systems tie EWBE Low (asserted), thus 
allowing the speed of writes to be controlled only by BRDY. 

If EWBE is sampled negated with the BRDY of an external 
write cycle, the processor does not do any of the following: 

■ Write store-buffer entry to data cache 

■ Write to memory (single-transfer or burst), including locked 
write to Accessed (A) bit after TLB load 

■ Execute serializing instructions like MOV to CRO, MOV to 
CR4, WBINVD, INVLPG, and CPUID: 

■ Respond to the following interrupts: 

• FLUSH 

• SMI 

■ Respond to any other interrupts or exceptions that cause a 
write to memory, such as pushing state onto the stack or set- 
ting the Accessed bit in a segment descriptor. This may 
include the BUSCHK, NMI, and INTR interrupts. 

For interrupts that do not write to memory (R/S, INIT, and 
STPCLK), the state of EWBE has no effect on the processor’s 
recognition of or response to such interrupts. The processor 
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latches any edge-triggered interrupt that may not be recog- 
nized while EWBE is negated (FLUSH, SMI, NMI) and recog- 
nizes them in priority order when EWBE is asserted. 

If system logic implements memory-mapped I/O as non-cache- 
able memory (the standard method), EWBE on the AMD-K5 
processor has the same effect on writes to memory-mapped I/O 
as does EWBE on the Pentium processor — neither processor 
reorders reads ahead of writes. 

For more details on the function of EWBE, see the following 
sections: 

■ BRDY — Page 5-41. 

■ HITM — Page 5-72. 

■ SMI — Page 5-116. 

■ SMIACT— Page 5-121. 

■ STPCLK— Page 5-122. 
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5.2.22 

Summary 

Driven 

Details 



FERR (Floating-Point Error) 

Output 

The processor asserts FERR to report the occurrence of an 
unmasked floating-point exception resulting from the execu- 
tion of a floating-point instruction. This signal is provided to 
allow the system logic to handle this exception in a manner 
consistent with IBM-compatible PC/AT systems. 

The state of the numeric error (NE) bit in CRO does not affect 
the FERR signal. 

The processor drives FERR every clock during memory cycles 
(including cache writethroughs and writebacks), cache hits of 
all types, I/O cycles, and locked cycles in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM. FERR 
is not driven during the Shutdown, Halt, Stop Grant, or Stop 
Clock states, or while RESET, INIT, or PRDY is asserted. 

The processor asserts FERR on the instruction boundary of the 
next floating-point instruction or WAIT instruction that occurs 
following the floating-point instruction that caused the 
unmasked floating-point exception — that is, FERR is not 
asserted at the time the exception occurs. The IGNNE signal 
does not affect the assertion of FERR. 

FERR is negated during the following conditions: 

■ Following the successful execution of the floating-point 
instructions FCLEX, FINIT, FSAVE, and FSTENV 

■ Under certain circumstances, following the successful exe- 
cution of the floating-point instructions FLDCW, FLDENV, 
and FRSTOR, which load the floating-point status word or 
the floating-point control word 

■ Following the rising transition of RESET 

FERR is always driven except in Tri-State Test mode. 
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5.2.23 FLUSH (Cache Flush) 

Input 

Summary FLUSH causes the processor to writeback (if necessary) and 

invalidate each line in its data and instruction caches. The pro- 
cessor generates a flush-acknowledge special bus cycle at the 
end of the entire operation. The signal is also used to invoke an 
output-float test at RESET. 

The processor samples FLUSH every clock and recognizes it at 
the next instruction boundary. FLUSH is a falling-edge-trig- 
gered interrupt and is latched when sampled. When FLUSH is 
recognized, the processor acknowledges it by driving a flush- 
acknowledge special bus cycle after all modified lines in the 
data cache are written back and after all lines in both caches 
are invalidated. 

FLUSH is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; or in the Shutdown, Halt, or 
Stop Grant states; or while AHOLD, BUFF, HLDA, or RESET is 
asserted. FLUSH is not sampled in the Stop Clock state, or 
while INIT or PRDY is asserted. 

If asserted at the falling edge of RESET, FLUSH invokes the 
processor’s three-state (float) test. System logic can drive the 
signal either synchronously or asynchronously (see the data 
sheet for synchronously driven setup and hold times). 

FLUSH is the third-highest-priority external interrupt. For 
details on its relationship to other interrupts and exceptions, 
see Section 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

Details FLUSH allows system logic to control the data that the proces- 

sor sees during cache accesses after changing operating modes 
or data environments. It also provides control for special cache 
coherency purposes. For example, FLUSH may be asserted 
when the processor enters SMM or in systems running 
extended memory managers if there is any change that may 
affect physical addresses. Depending on how an L2 cache 
serves the processor and other caching devices, system logic 
may want to cause the L2 cache to invalidate its same locations 
when system logic asserts FLUSH to the processor, or it may 
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use the Flush-Acknowledge special bus cycle to initiate such 
action. 

Entry into SMM may require the assertion of FLUSH. If the 
SMM physical memory space overlaps physical main memory 
that is cacheable, FLUSH must be asserted with SMI (the 
FLUSH will be performed first, because it is a higher-priority 
interrupt). If this is not done, accesses to the SMM memory 
space after entering SMM may hit cached locations in the main 
memory space. In addition, if SMM memory is itself cacheable, 
the SMM service routine should execute the WBINVD (write- 
back and invalidate) instruction when leaving SMM, just prior 
to executing the RSM instruction. 

The processor performs the FLUSH operation using the same 
microcode that executes for the WBINVD (writeback and inval- 
idate) instruction. The only difference is the special bus cycle 
driven upon completion of the operation. A writeback and 
invalidation operation can be time consuming because all mod- 
ified lines in the data cache are written back to memory. If 
writebacks are not required, the INVD instruction or RESET 
can be used to invalidate all contents of the caches. 

When FLUSH is recognized at an instruction boundary, the 
processor performs the following actions in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 

2. Writeback and Invalidate — The processor writes back any 
modified lines in the data cache, and then (after all write- 
backs) simultaneously invalidates all lines in the instruc- 
tion and data caches. The invalidations are done by clearing 
the valid bits in both the linear and physical tag directories. 

3. Acknowledge — After the writeback and invalidation com- 
pletes, the processor drives a FL U SH-acknowledge special 
bus cycle. This cycle is identified by D/C = 0, M/IU = 0, W/R 
= 1, BE7-BE0 = EFh and A31-A3 = 0. System logic must 
return BRDY in response to this cycle. 

AHOLD, BUFF, and HOLD are all recognized and behave nor- 
mally while FLUSH is asserted, and they will intervene in an 
in-progress FLUSH operation. For example, if BUFF is 
asserted while a FLUSH operation is writing modified lines 
back to memory, an in-progress writeback will be aborted. 
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When BUFF is subsequently negated, the writeback is 
restarted and the FLUSH operation continues from where it 
left off. Any writebacks that completed before BUFF was 
asserted are not affected by BUFF’s intervention. 

If FLUSH is asserted while AHULD, BUFF, or HLDA is 
asserted, the outcome of the flush depends on whether the 
flush causes writebacks of modified lines. If no writebacks are 
needed, the processor invalidates all lines but does not per- 
form the FLU SH-acknowledge cycle until the processor gets 
control of the bus again. If a writeback is needed, the processor 
stops at that writeback, without having invalidated any lines, 
waits until control of the bus is returned to the processor, then 
completes the FLUSH operation. If FLUSH is asserted during 
the Stop Grant state, the signal is held pending until after the 
processor exits the Stop Grant state, at which point it is acted 
upon. 

No other interrupt or exception will intervene in a flush opera- 
tion because such interrupts are not recognized until after the 
FLUSH-Acknowledge special bus cycle, which occurs at the 
end of all writebacks and invalidations. The processor latches 
the assertion of any edge-triggered interrupt (FLUSH, SMI, 
INIT, NMI) while FLUSH is asserted and recognizes latched 
interrupts in priority order when FLUSH is negated. 

The Three-State (float) Test mode, entered if FLUSH is 
asserted during RESET, causes the processor to float all of its 
output and bidirectional signals. In this isolated state, system 
board traces and connections can be tested for integrity and 
driveability. The Float-Test mode can only be exited by assert- 
ing RESET again. 

Un the AMD-K5 and Pentium processors, FLUSH is an edge- 
triggered interrupt. Un the early 486 processors, however, the 
signal is a level-sensitive input. 
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5 . 2.24 

Summary 

Sampled 

Details 



FRCMC (Functional-Redundancy Check Master/Checker) 

Input 

If FRCMC is asserted at RESET, the processor enters Func- 
tional-Redundancy Checking mode, as the checker, and 
reports checking errors on the IERR output. If FRCMC is 
negated at RESET, the processor operates normally, although 
it also behaves as the master in a functional-redundancy check- 
ing arrangement with a checker. 

The processor samples FRCMC at the falling edge of RESET. 
The processor does not sample FRCMC at any other time. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

In the Functional-Redundancy Checking mode, two processors 
have their signals tied together. One processor (the master) 
operates normally. The other processor (the checker) has its 
output and bidirectional signals (except for TDO and IERR) 
floated to detect the state of the master’s signals. The master 
controls instruction fetching and the checker mimics its behav- 
ior by sampling the fetched instructions as they appear on the 
bus. Both processors execute the instructions in lock step. The 
checker compares the state of the master’s output and bidirec- 
tional signals with the state that the checker itself would have 
driven for the same instruction stream. Errors detected by the 
checker are reported on the checker’s IERR output. On the 
AMD-K5 processor, the IERR output is reserved solely for 
functional-redundancy checking; no other errors are reported 
on that output. 

Functional-redundancy checking is typically implemented on 
single-processor, fault-monitoring systems (which actually 
have two processors). The master processor runs the opera- 
tional programs and the checker processor is dedicated 
entirely to constant checking. In this arrangement, the test of 
accurate operation consists solely of reporting one or more 
errors; the particular type of error or the instruction causing 
an error is not reported. The arrangement works because the 
processor is entirely deterministic. Speculative prefetching, 
speculative execution, and cache replacement all occur in 
identical ways and at identical times on both processors, if 
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their signals are tied together so that they run the same pro- 
gram. 

The Functional-Redundancy Checking mode can only be 
exited by the assertion of RESET. Functional-redundancy 
checking cannot be done in the Hardware Debug Tool (HDT) 
mode. The assertion of FRCMC is not recognized while PRDY 
is asserted. 
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5.2.25 

Summary 

Driven 

Details 



HIT (Inquire-Cyde Hit) 

Output 

The processor asserts HIT to indicate that an inquire cycle hit 
a valid line in the processor’s instruction or data cache. 

The processor drives HIT every clock. The signal changes state 
two clocks after the assertion of EADS and retains that state 
until two clocks after the next EADS. 

HIT is driven at all times, except while the processor is in the 
Stop Clock state, or while RESET or INIT is asserted. 

The processor asserts HIT if an inquire cycle address matches 
the address of a valid line in the processor’s instruction cache 
in the shared state, or of a shared, exclusive, or modified line in 
the processor’s data cache (called a cache hit). The processor 
holds HIT negated if the inquire cycle address does not match 
any valid address in either cache (called a cache miss). 

Table 5-11 shows the relationship between HIT, HITM, and 
INV. Inquire cycle logic in systems with look-aside caches can 
be simplified by monitoring only HITM and ignoring HIT. This 
works because the resulting state of a hit line is determined 
only by the state of the INV input during the assertion of 
EADS: 

■ If INV is negated during a hit, the hit line — whether shared, 
exclusive, or modified — transitions to the shared state. Thus, 
the inquiring master can safely cache the same data in the 
shared state without knowing whether the inquire cycle hit 
in the processor’s cache (and thus, without system logic 
monitoring HIT). 

■ If INV is asserted during a hit, the hit line — whether shared, 
exclusive, or modified — transitions to the invalid state. If the 
line was modified before the inquire, HITM is also asserted 
and the line is written back before the invalidation; if the 
line was shared or exclusive before the inquire, no writeback 
occurs before the invalidation. 

■ If the inquire cycle misses, regardless of the state of INV, 
the inquiring master can cache the target data in the shared 
state, although it will not have enough information to cache 
that line in the exclusive state (this requires that HIT be 
monitored). 
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Table 5-11. MESI-State Transitions for Inquire Cycles 



Signal or Event 


Result of Cache Lookup 


Inquire Miss 


Inquire Hit 


shared or exclusive 


modified 


HIT 


1 


0 


0 


0 


0 


HUM 1 


l 2 


1 


1 


0 


0 


INV 


- 


1 


0 


1 


0 


Write to Memory 


no 


no 


no 


writeback (32 
bytes) 


writeback (32 
bytes) 


State After Inquire 3 


- 


invalid 


shared 


invalid 


shared 



Notes: 



- Don 't care or not applicable. 

1. Asserted only for data cache hits to modified lines. Instruction cache lines can only be in the shared or invalid state. 

2. HUM is never asserted while HIT is negated. 

3. Transition occurs after any write to memory. Lines in "shared" MESI state are said to be in "writethrough" state. Those in "exclusive" 
or "modified" MESI states are said to be in "writeback" state. 



Inquire cycle logic in systems with look-through caches, how- 
ever, normally monitor both HIT and HITM because such sys- 
tems often implement the write-once cache protocol. The 
write-once protocol requires caching in the exclusive state at 
certain transitions, and the exclusive state can only be identi- 
fied if both HIT and HITM are monitored. For details on this 
protocol, see Section 6.2.6 on page 6-19. 

Inquire cycles can be driven while LOCK is asserted, if 
AHOLD is used to obtain the bus for the inquire cycle. An 
inquire cycle cannot hit a line that is involved in a locked oper- 
ation (LOCK asserted). The processor prevents this by always 
checking its cache tags prior to a locked operation. If the loca- 
tion is cached, it is written back (if necessary) and invalidated 
prior to the locked operation. 

The Pentium processor does not recognize an inquire cycle hit 
on an in-progress cache line fill prior to the first BRDY, and it 
will cache that line in the exclusive state if PWT = 0 and WB/ 
WT = 1. This may cause the line to be cached in the exclusive 
state in two separate caches if the system supports other cach- 
ing masters. In such cases, the AMD-K5 processor asserts HIT 
and caches the line in the shared state or does not cache it, 
depending on the state of the INV signal. 
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5.2.26 

Summary 

Driven 

Details 



HUM (Inquire Cycle Hit To Modified Line) 

Output 

The processor asserts HITM to indicate that an inquire cycle 
hit a modified line in the processor’s data cache. If this occurs, 
the processor writes the line back to memory during or after 
the bus-hold tenure, depending on which signal is holding the 
processor off the bus. HIT is always asserted whenever HITM 
is asserted. 

The processor drives HITM every clock. The signal changes 
state two clocks after the assertion of EADS. If the inquire 
cycle misses the cache or hits an exclusive or shared line in the 
cache, the processor holds HITM negated and another inquire 
cycle can begin in that clock (two clocks after EADS). If the 
inquire cycle hits a modified line in the data cache, the proces- 
sor asserts HITM and holds it asserted until one clock after the 
last BRDY of the writeback, then negates it. 

HITM is driven at all times, except while the processor is in the 
Stop Clock state, or while RESET or INIT is asserted. 

The processor asserts HITM when an inquire cycle address 
matches the address of a modified line in the processor’s data 
cache. The processor then attempts to drive a four-transfer 
burst writeback of the modified line. If INV was asserted at the 
time EADS was asserted for the inquire cycle, a hit leaves the 
written-back line in the invalid state. If INV was negated at the 
time EADS was asserted, a hit leaves the written-back line in 
the shared state. For a comparison of the states that HITM, 
HIT, and INV can assume, see Table 5-11 on page 5-71. 

System logic can use HITM to inhibit access to the bus by other 
masters (via BUFF or HOLD) until the writeback associated 
with the hit has completed. The time at which the writeback 
occurs depends on which input signal was used to hold the pro- 
cessor off the bus for the inquire cycle: 

■ If AHOLD was used, the processor drives the writeback as 
early as two clocks after asserting HITM, whether or not 
AHOLD is still asserted at that time. 

■ If BUFF or HOLD was used, the processor delays the write- 
back until after BUFF or HE P A is negated. In the case of 
BUFF, the writeback is driven before any aborted bus cycle 
is restarted. 
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The processor drives writebacks by asserting ADS and either 
reusing the inquire cycle address (if AHOLD is held asserted 
throughout the writeback) or driving the address itself (if 
AHOLD is negated for the writeback, or if BOLL or HOLD was 
used to obtain the bus). If AHOLD is held asserted throughout 
an inquire cycle and a subsequent writeback, system logic 
must latch the inquire cycle address when it asserts LADS and 
use the latched copy during the writeback. By contrast, if sys- 
tem logic always negates AHOLD before the writeback, the 
processor drives the writeback address when it asserts ADS for 
the writeback, and system logic need not latch a copy of the 
inquire cycle address. 

Inquire cycles can be driven while LOCK is asserted, if 
AHOLD is used to obtain the bus for the inquire cycle. An 
inquire cycle cannot hit a line involved in a locked operation. 
Cached locations that are about to be accessed in locked opera- 
tions are written back and invalidated before the locked opera- 
tion occurs. If such an inquire cycle hits a modified location 
that is different than the one involved in the locked operation, 
the writeback is done in the middle of the locked operation, 
between the two locked cycles, and LOCK is asserted during 
the writeback. This is the only case in which another operation 
can intervene in a locked operation. System logic must recog- 
nize this case and know that the inquire cycle is snooping a dif- 
ferent location than the one that is locked. 

At the falling edge of RESET, the states of BRDYC and BUS- 
CHK control the drive strength on the A21-A3 (not including 
A31-A22), ADS, H1TM, and W/R signals. The drive strength is 
weak for all states of BRDYC and BUSCHK except when 
BRDYC and BUSCHK are both Low, in which case the drive 
strength is strong. The A31-A22 signals use the weak drive 
strength at all times. See the data sheet for details. 
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5 . 2.27 

Summary 

Driven 



Details 



HLDA (Bus-Hold Acknowledge) 

Output 

When system logic asserts HOLD, the processor completes any 
in-progress bus cycle, floats its cycle-driving outputs, and 
asserts HLDA as an acknowledgment. While HLDA is asserted, 
another bus master can drive cycles on the bus, including 
inquire cycles to the processor. 

The processor drives HLDA every clock. The processor floats 
the cycle-driving outputs on the bus and asserts HLDA two 
clocks after the last BRDY of an in-progress bus cycle, if such a 
cycle is in progress when HOLD is asserted, or two clocks after 
the assertion of HOLD, whichever comes last. The processor 
continues to float the bus and assert HLDA until two clocks 
after HOLD is negated. 

HLDA is driven during cache hits in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM, but 
writebacks wait until HLDA is negated. HLDA is also driven in 
the Shutdown, Halt, Stop Grant, and Stop Clock states; or 
while AHOLD, BUFF, RESET, INIT, or PRDY is asserted. 
HLDA is not driven during processor-originated bus cycles, 
because any such pending bus cycle completes before the pro- 
cessor asserts HLDA. 

HLDA is the processor’s acknowledgment to HOLD. HLDA 
indicates that any in-progress bus cycle has completed and that 
the output and bidirectional signals used for memory or I/O 
accesses are floating. Table 5-12 shows the signals floated. The 
same set of signals is floated with BUFF. 



Table 5-12. Outputs Floated When HLDA is Asserted 



Address and 
Address Parity 


Cycle Definition 
and Control 


Data and 
Data Parity 


Cache 

Control 


A3 1 -A3 


D/C 


D63-D0 


CACHE 


ADS 


COCK 


DP7-DP0 


PCD 


AD5C 


M/10 


BE7-BE0 


PWT 


AP 


SCYC 


N/A 


N/A 


N/A 


W/R 


N/A 


N/A 
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Unlike BUFF, the assertion of HOLD does not abort an in- 
progress cycle. If the processor is not driving a bus cycle when 
HOLD is asserted, the bus master asserting or causing the 
assertion of HOLD can begin driving its first bus cycle in the 
clock after HLDA is asserted, which occurs two clocks after 
HOLD is asserted. The processor supports only one in-progress 
bus cycle. Unlike the Pentium processor, no pending bus cycles 
are held in write buffers between the data cache and the bus 
interface on the AMD-K5 processor. 

The processor can assert ADS in the clock after HOLD is 
asserted (but before asserting HLDA) and drive a bus cycle 
before acknowledging HOLD with HLDA. System logic may 
assert EADS for an inquire cycle as early as one clock after the 
processor asserts HLDA. 

The processor continues driving HLDA until two clocks after 
HOLD is negated, at which time the processor may again drive 
its own cycles with ADS in the next clock after it negates 
HLDA. The processor responds to inquire cycles while HLDA 
is asserted and will assert HIT and HITM in response to such 
cycles. If HITM is asserted, the writeback is performed imme- 
diately after HLDA is negated. Multiple inquire cycles are not 
permitted to hit modified lines. The processor implements this 
restriction by ignoring EADS while HITM is asserted; when 
HITM is asserted, it is held asserted until one clock after the 
last BKDY of the writeback. 

For a list of signals recognized while HLDA is asserted, see 
Table 5-2 on page 5-8. See the description of HOLD on page 5- 
76 for additional details about the HOLD/HLDA protocol. 
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5.2.28 HOLD (Bus-Hold Request) 

Input 

Summary When system logic asserts HOLD, the processor completes any 

in-progress bus cycle, floats its cycle-driving outputs, and 
asserts HLDA to acknowledge the HOLD. 

The processor samples HOLD every clock. It acknowledges 
HOLD by floating the cycle-driving outputs on the bus and 
asserting HLDA two clocks after the last BRDY of an in- 
progress bus cycle, if such a cycle is in progress when HOLD is 
asserted, or two clocks after the assertion of HOLD, whichever 
comes last. The processor continues to float the bus and assert 
HLDA until two clocks after HOLD is negated. 

HOLD is sampled during memory cycles (including cache 
writethroughs and writebacks), I/O cycles, inquire cycles, and 
special bus cycles in the normal operating modes (Real, Pro- 
tected, and Virtual-8086) and in SMM; in the Shutdown, Halt, 
Stop Grant, and Stop Clock states; or while AHOLD, BUFF, 
RESET, INIT, or PRDY is asserted. HOLD is not sampled dur- 
ing locked cycles or interrupt acknowledge operations. 

Details The assertion of HOLD, like BUFF but unlike AHULD, forces 

the processor to relinquish the full address and data bus to 
another bus master. The signal can be used for the following 
purposes: 

■ Bus Turnaround — Another bus master can assert HULD to 
the processor to obtain control of the bus, allowing the 
other bus master to drive any type of bus cycles. 

■ Inquire Cycles — In multi-master systems with shared mem- 
ory, another bus master typically drives an inquire cycle to 
the processor or its L2 cache prior to driving a read or write 
cycle to any memory locations shared by both masters. Such 
inquire cycles can be driven while HULD is asserted. 

HULD provides the slowest response of the three bus-hold 
inputs and is normally useful only in single-bus (non-bridged), 
single-processor systems with a look-aside L2 cache. For exam- 
ple, a DMA controller may use HULD to obtain the bus, run 
inquire cycles, and perform memory reads and writes. See 
Section 6.2.5 on page 14 for system configurations using 
HULD. 



Sampled and 
Acknowledged 
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Like AHOLD but unlike BOLL, HOLD allows the processor to 
complete an in-progress bus cycle before the processor floats 
its cycle-driving outputs. Such an in-progress cycle may consist 
of a single-transfer cycle, burst cycle, sequence of locked 
cycles (such as an interrupt acknowledge operation), or a spe- 
cial bus cycle. The processor supports only one in-progress bus 
cycle; no pending bus cycles are buffered. Like BOLL, HOLD 
has no effect on writes to the processor’s store buffer, except to 
delay them. (The store buffer is situated between the execu- 
tion units and the data cache, and it is used for speculative 
stores prior to being written in non-speculative state to the 
data cache.) 

When HOLD is asserted, system logic may continue asserting 
HOLD for as long as it wants. The processor has no way of 
breaking the hold. The processor continues driving HLDA until 
two clocks after HOLD is negated, at which time the processor 
may again drive its own cycles with ADS in the next clock after 
it negates HLDA. During the time HOLD is asserted, the pro- 
cessor attempts to operate out of its cache. If it can no longer 
do so, it asserts BREQ continuously. 

There are three methods by which system logic can obtain con- 
trol of the address bus to drive an inquire cycle: AHOLD, 
BOLL, or HOLD. AHOLD obtains control only of the address 
bus and allows another master to drive only inquire cycles, 
whereas BOLL and HOLD obtain control of the full bus 
(address and data), allowing another master to drive not only 
inquire cycles but also read and write cycles. Unlike BOLL, 
AHOLD and HOLD both permit an in-progress bus cycle to 
complete, but writebacks can occur while AHOLD is asserted, 
whereas pending writebacks during the assertion of HOLD 
occur after HOLD is negated, which is similar to BOLL. 

If EADS is asserted on the same clock that HOLD is negated, 
the processor recognizes this as a valid inquire cycle and han- 
dles it correctly. However, if EADS is asserted on the clock fol- 
lowing the negation of HOLD, the AMD-K5 processor does not 
recognize this as a valid inquire cycle. 

See the description of HLDA on page 5-74 for additional 
details about the HOLD/HLDA protocol. 



Signal Descriptions 



5-77 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



5.2.29 

Summary 

Driven 

Details 



IERR (Internal Error) 

Output 

The processor drives IERR only in Functional-Redundancy 
Checking mode. If the processor is the checker and it detects a 
difference in signal outputs between the master and itself, it 
asserts IERR. No other errors are reported with IERR. 

The processor drives IERR every clock while the processor is 
operating as the checker in the Functional-Redundancy Check- 
ing mode. If an error is detected, IERR is asserted for one 
clock, starting two clocks after the detection of the error. 

IERR is only driven in Functional-Redundancy Checking mode 
when the processor is the checker, including while PRDY is 
asserted within this mode. 

The processor enters Functional-Redundancy Checking mode 
as the checker if FRCMC is asserted at RESET. In this mode, 
all of the processor’s output and bidirectional signals (except 
IERR and TDO) are floated and tied to those of the master pro- 
cessor. Both processors execute the same instructions, and the 
checker compares the state of the master’s output and bidirec- 
tional signals with the state that the checker itself would have 
driven for the same instruction stream. 

If a mismatch occurs on such a comparison, the checker asserts 
IERR for one clock, two clocks after the detection of the error. 
Both the master and the checker continue running the check- 
ing program after an error occurs. No action other than the 
assertion of IERR is taken by the processor. 

No other errors are reported with IERR. Unlike the Pentium 
processor, the AMD-K5 processor does not report parity errors 
on IERR for every cache or TLB access. Instead, the AMD-K5 
processor fully tests cache parity during the built-in self test 
(BIST), which is invoked by asserting INIT during RESET. 
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5.2.30 

Summary 

Sampled 

Details 



IGNNE (Ignore Numeric Error) 

Input 

IGNNE, in conjunction with the numeric error (NE) bit in CRO, 
is used by the system logic to control the effect of an unmasked 
floating-point exception on a previous floating-point instruc- 
tion during the execution of a floating-point instruction or the 
WAIT instruction — hereafter referred to as the target instruc- 
tion. 

The processor samples IGNNE every clock during memory 
cycles (including cache writethroughs and writebacks), cache 
hits of all types, I/O cycles, locked cycles, special bus cycles, 
and interrupt acknowledge operations in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM; or 
while AHOLD, BUFF, or HLDA is asserted. IGNNE is not sam- 
pled in the Shutdown, Halt, Stop Grant, or Stop Clock states; or 
while RESET, INIT, or PRDY is asserted. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

If an unmasked floating-point exception is pending and the tar- 
get instruction is considered error-sensitive, then the relation- 
ship between NE and IGNNE is as follows: 

■ If NE = 0, then: 

• If IGNNE is sampled asserted, the processor ignores the 
floating-point exception and continues with the execu- 
tion of the target instruction. 

• If IGNNE is sampled negated, the processor waits until it 
samples IGNNE, INTR, SMI, NMI, or INIT asserted. 

If IGNNE is sampled asserted while waiting, the proces- 
sor ignores the floating-point exception and continues 
with the execution of the target instruction. 

If INTR, SMI, NMI, or INIT is sampled asserted while 
waiting, the processor handles its assertion appropri- 
ately. 

■ If NE = 1, the processor invokes the INT lOh exception 
handler. 
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If an unmasked floating-point exception is pending and the tar- 
get instruction is considered error-insensitive, then the proces- 
sor ignores the floating-point exception and continues with the 
execution of the target instruction. 

FERR is not affected by the state of the NE bit or IGNNE. 
FERR is always asserted at the instruction boundary of the tar- 
get instruction that follows the floating-point instruction that 
caused the unmasked floating-point exception. 

This signal is provided to allow the system logic to handle 
exceptions in a manner consistent with IBM-compatible PC/AT 
systems. 
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5 . 2.31 

Summary 

Sampled 



Details 



INIT (Initialization) 

Input 

The assertion of INIT causes the processor to reinitialize its 
system registers and certain other resources, but it preserves 
the contents of the caches, the floating-point state, and certain 
other resources. If INIT is asserted at RESET, it invokes the 
processor’s built-in self test (BIST). 

The processor samples INIT every clock and recognizes it at 
the next instruction boundary. INIT is a rising-edge-triggered 
interrupt and is latched when sampled. However, in order to 
be recognized reliably, the signal must be negated for two 
clocks prior to assertion. 

INIT is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086); or in the Shutdown, Halt, or Stop Grant 
states; or while AHOLD, BUFF, HLDA, or RESET is asserted. 
INIT is not sampled in the Stop Clock state or while PRDY is 
asserted. 

If INIT is asserted on the falling edge of RESET, the processor 
performs its built-in self test (BIST) before initialization and 
code fetching begin. System logic can drive the signal either 
synchronously or asynchronously (see the data sheet for syn- 
chronously driven setup and hold times). 

If INIT is asserted at the same time as RESET, RESET is recog- 
nized but INIT is not. If INIT and NMI are both asserted during 
the Stop Grant state (not necessarily simultaneously), the 
AMD-K5 processor recognizes the INIT after leaving the Stop 
Grant state, then it recognizes the NMI prior to fetching any 
instructions. The Pentium processor does not recognize the 
NMI. 

INIT is the fifth-highest-priority external interrupt. For details 
on its relationship to other interrupts and exceptions, see Sec- 
tion 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

INIT is typically asserted after power-up in response to a BIOS 
interrupt that writes to an I/O port. This is often, for example, 
in response to the operator’s pressing Ctrl-Alt-Del. The BIOS 
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writes to a port (such as port 64h in the keyboard controller) 
that asserts INIT. 

INIT is also used to support 286 software that must return to 
Real mode after accessing extended memory in Protected 
mode. The 286 processor does not have an INIT input; a transi- 
tion from Protected mode to Real mode can only be made on 
the 286 processor by asserting RESET. With the INIT signal, 
however, the operating system, through a BIOS interrupt, can 
cause the transition without loss of cache contents or floating- 
point state. 

Upon recognizing an INIT interrupt at the next instruction 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates the: 

• Instruction pipeline 

• Translation look-aside buffer (TLB) 

2. Reinitialize — The processor reinitializes the following 
resources to reset values: 

• General-purpose registers 

• System registers 

3. Jump To BIOS — The processor jumps to the BIOS at address 
FFFF_FFFOh, the same entry point used after RESET. (See 
the description of RESET on page 5-109 for details on the 
aliasing of this boot address.) 

Unlike RESET, INIT does not reinitialize the data and instruc- 
tion caches, floating-point registers, model-specific registers, 
or cache disable (CD) and not-writethrough (NW) bits in CRO. 

A20M should not be asserted during the first code fetch follow- 
ing the INIT cycle. The operating system alone is responsible 
for controlling the state of A20M by writing to an external reg- 
ister provided for this purpose. (See the description of A20M 
on page 5-18.) 

INIT can only be driven at a predictable time, relative to pro- 
gram order, by using an I/O write. Due to the signal’s recogni- 
tion on an instruction boundary, if initialization is to be 
performed immediately after an I/O write, INIT must be held 



5-82 



Bus Interface 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



asserted three clocks before the BRDY of that write in order to 
prevent another cycle from starting. 

INIT invokes the processor’s built-in self test (BIST) if asserted 
at the falling edge of RESET. The BIST runs a series of tests on 
the internal hardware that exercise the following resources — 
all cache tags (linear and physical) and cache arrays, the entry- 
point and instruction-decode PLAs, and the microcode ROM. 
At the end of the BIST, a value representing the result of the 
tests is stored in the EAX register. Zero means passed and any 
other value means failed. The processor continues with its nor- 
mal boot process after the BIST completes, whether the BIST 
passed or failed. 

The processor recognizes BOEE, HOLD, AHOLD, and R/S while 
INIT is asserted, but these signals will not intervene in the ini- 
tialization process except that they will prevent the first code 
fetch (jump to BIOS) after the registers are initialized. 

No other exceptions or interrupts will intervene in the initial- 
ization process. The first code fetch after the registers are ini- 
tialized will occur before another interrupt or exception is 
recognized. The processor latches the assertion of any edge- 
triggered interrupt (FLUSH, SMI, INIT, NMI) while INIT is 
asserted and recognizes latched interrupts in priority order 
when INIT is negated. If INIT is asserted during the Stop Grant 
state, the signal is held pending until after the processor exits 
the Stop Grant state, at which point it is acted upon. 
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5.2.32 INTR (Maskable Interrupt) 

Input 

Summary The assertion of INTR, if enabled by software (unmasked), 

causes the processor to acknowledge the interrupt and enter 
an interrupt service routine. The routine is specified by the 
vector obtained during the acknowledgment. 

The processor samples INTR every clock and recognizes it at 
the next instruction boundary. INTR is a level-sensitive inter- 
rupt and must be held asserted until recognized. When recog- 
nized, the processor acknowledges it by driving an interrupt 
acknowledge bus operation (a cycle pair). 

INTR is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; or in the Halt state. INTR is not 
sampled in the Shutdown, Stop Grant, or Stop Clock states; or 
while AHOLD, BUFF, or HLDA, RESET, INIT, or PRDY is 
asserted. 

INTR is the seven th-highest-priority external interrupt. For 
details on its relationship to other interrupts and exceptions, 
see Section 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

Details In typical PC systems, maskable interrupts are driven to the 

processor on INTR from external interrupt-control logic that 
prioritizes the interrupts from several I/O devices. The proces- 
sor only recognizes INTR if it is enabled in software by setting 
the interrupt flag (IF) in the EFLAGS register to 1. 

Upon recognizing an INTR interrupt at the next instruction- 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 



Sampled and 
Acknowledged 
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2. Acknowledge — Drives an Interrupt acknowledge operation 
(a cycle pair) on the bus. System logic must return a BRDY 
in response to both cycles. Table 5-13 shows the signal val- 
ues driven during the first and second bus cycles. Both bus 
cycles are reads, but any data returned on the first cycle is 
ignored. On the second cycle, the processor samples only 
the enabled data byte (D7-D0) to obtain the interrupt vec- 
tor. (The interrupt vector is an offset into an interrupt table 
containing gate or segment descriptors.) The bus cycles are 
driven as a locked pair, with a minimum of one idle clock 
between the cycles and with LOCK asserted throughout. 
System logic may respond as quickly as it is able; BRDY 
operates in the normal manner to terminate each of the two 
cycles. The first cycle is provided only for compatibility 
with the original protocol; it carries no useful information. 



Table 5- 1 3. Interrupt Acknowledge Operation Definition 



Processor Outputs 


First Bus Cycle 


Second Bus Cycle 


D/C 


0 


0 


M/IU 


0 


0 


W/R 


0 


0 


BE7-BE0 


EFh 


FEh (low byte enabled) 


A3 1 -A3 


0 


0 


D63-D0 


(ignored) 


Interrupt vector expected from interrupt 
controller on D7-D0 



3. Disable Interrupts — The processor clears the IF bit in the 
EFLAGS register if (a) the processor is in Real mode, or (b) 
the processor is in Protected mode and the interrupt vector 
points to an interrupt gate or to a task gate that references 
a TSS that has its IF bit cleared. (For details on how the IF 
bit is managed in Virtual-8086 mode, see page 3-12.) 

4. Service Interrupt — Using the interrupt vector as an entry 
point, the processor saves its state and accesses a data 
structure set up by the operating system. In Real mode, the 
processor accesses the interrupt vector table (IVT); in Pro- 
tected mode, it accesses the interrupt descriptor table 
(IDT). The vector identifies one of 256 gates (descriptors) in 
the table. The IDT, for example, can contain interrupt, trap, 
or task gates, all of which point indirectly to the entry point 
of an interrupt service routine. 
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The interrupt service routine, upon entry, may re-enable inter- 
rupts by setting the IF bit in the EFLAGS before servicing the 
interrupt. This is typically done if the routine is lengthy, so 
that the processor can respond to higher-priority interrupts 
while the current interrupt is being serviced, thus allowing 
nested interrupts. Upon return from the service routine via an 
IRET instruction, the processor pops the contents of the CS, 
EIP, and EFLAGS registers (at a minimum) from the stack and 
continues where it left off. 

System logic typically is not able to determine the instruction 
boundary on which the processor recognizes INTR. Thus, as a 
practical matter, system logic should hold INTR asserted until 
the beginning of the interrupt acknowledge operation, or until 
there is some other evidence that the interrupt service routine 
has been entered (for example, the access to the IDT address). 

The processor disables INTR interrupts during all software 
interrupts by clearing the IF bit in EFLAGS. Software may re- 
enable INTR interrupts by setting IF to 1 again on entering the 
service routine. In this context, software interrupts include: 

■ In Real mode, any INTn instruction 

■ In Protected mode, any INTn instruction that vectors to an 
IDT entry that is an interrupt gate, or that is a task gate 
which references a TSS with the interrupt flag (IF) cleared 
in its EFLAGS image. (INTn instructions that vector to a 
trap gate are not considered software interrupts because 
the processor does not clear IF in such cases). 

If system logic can leave the INTR signal asserted after the 
INTR service routine is entered, the interrupt vector returned 
by system logic during the interrupt acknowledge operation 
must (in Protected mode) be for an interrupt gate, or for a task 
gate that references a TSS with its IF cleared. If the returned 
vector is not one of these two types, the processor will again 
respond to INTR prior to executing the first instruction of the 
service routine, causing an infinite loop. 

The processor recognizes HUFF, HOLD, and AHOLD while 
INTR is asserted, and these signals will intervene in the INTR 
service routine. Other interrupts can intervene in the INTR 
interrupt on entry into the INTR service routine. 
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INTR is not recognized if asserted while AHOLD, BUFF, or 
HLDA is asserted, because the processor cannot drive the 
interrupt acknowledge operation and therefore cannot obtain 
the interrupt vector. 
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5.2.33 

Summary 

Sampled 

Details 



INV (Invalidate Cache Line) 

Input 

During an inquire cycle, the state of INV determines whether 
the addressed cache line, if found in the processor’s instruction 
or data cache, transitions to the invalid or shared state. 

INV is sampled with the same timing as EADS. See the descrip- 
tion of EADS on page 5-58. 

If INV is asserted when EADS is asserted at the beginning of 
an inquire cycle, the processor transitions the line (if found) to 
the invalid state, regardless of the state in which the cache line 
was found; such cycles are sometimes called invalidate cycles, 
or simply invalidations. If INV is negated when EADS is 
asserted, the processor transitions the line (if found) to the 
shared state. In either case, if the line is found in the modified 
state, the processor writes it back to memory before changing 
its state. 

INV is typically asserted during a write by another caching 
master. In such cases, INV can be generated by watching W/R 
from another bus master and asserting INV to the processor, 
along with EADS, only on writes. This method invalidates a 
copy that the processor may have cached, whether modified or 
not, for the same location being written by the other bus mas- 
ter. The processor’s assertion of HITM and/or HIT does not 
influence how INV affects a line found in the cache. Those two 
outputs simply indicate whether the line was found (HIT) and 
whether a writeback will follow (HITM). If INV is asserted dur- 
ing the inquire, the resulting state of the line (invalid) is 
entirely determined by INV, without reference to HITM and/or 
HIT. If INV is negated during the inquire, the resulting state of 
a hit line (shared) is also entirely determined by INV, but sys- 
tem logic will not know whether a writeback is imminent with- 
out monitoring HITM, and another bus master will not be able 
to cache the line in the exclusive state without monitoring HIT. 

For a comparison of the states that HITM, HIT, and INV can 
assume, see Table 5-11 on page 5-71. 
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5 . 2.34 

Summary 

Sampled 

Details 



KEN (External Cache Enable) 

Input 

System logic overrides the cacheability of read cycles with 
KEN. If KEN is negated during a read cycle, the data returned 
to the processor will not be cached. If KEN is asserted at that 
time, cacheability and the ME SI state of cached lines depends 
on the states of the CACHE and PWT outputs and the WB/WT 
input. 

The processor samples KEN in the same clock as the first 
BRDY of the read cycle or NA, whichever comes first. 

KEN is sampled only during memory reads in the normal oper- 
ating modes (Real, Protected, and Virtual-8086) and in SMM. 
KEN is not sampled during memory writes, inquire cycles, I/O 
cycles, locked cycles, special bus cycles, or interrupt acknowl- 
edge operations; during the Shutdown, Halt, Stop Grant, or 
Stop Clock states; or while BUFF, HLDA, RESET, INIT, or 
PRDY is asserted. While AHOLD is asserted, KEN is sampled 
only to complete a bus cycle already begun before the asser- 
tion of AHOLD. 

System logic typically maintains a specification of address 
cacheability in external registers that are written by BIOS at 
boot time. The BIOS does this by knowing or determining the 
address ranges of memory-mapped I/O ports and other loca- 
tions that should be noncacheable. For example, video and net- 
work boards are normally mapped by BIOS to the high-memory 
area between 640 Kbyte and 1 Mbyte, an area that is non- 
cacheable for both functional and security reasons. (The pro- 
cessor would not be able to detect changes in the state of mem- 
ory-mapped network or semaphore I/O ports that are cached, 
and video frames written to a writeback cache would not be 
visible on a display.) 

In Protected mode (paging enabled), the operating system can 
map linear addresses to physical addresses using pages that it 
knows to be cacheable or non-cacheable. But in non-paging 
modes, the operating system has no control over cacheability 
and the external cacheability registers are the only available 
mechanism for determining whether an address is cacheable or 
non-cacheable. 
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The processor’s CACHE output can be used to initiate an 
address lookup in the external cacheability registers, and the 
result of the lookup can be used to drive KEN. 

If the address of an access falls within a cacheable range, KEN 
must be asserted during the first BRDY or NA of the bus cycle, 
whichever comes first. If KEN and CACHE are both asserted 
during a memory read, the processor performs the read cycle 
as a four-transfer burst that fills a cache line, and four BRDYs 
must be returned with the data. If either KEN or CACHE is 
negated during the first BRDY or NA of the cycle, the proces- 
sor ends the cycle at that BRDY or NA, with only a single quad- 
word transfer. The processor ignores KEN during writes or 
while CACHE is negated. For details on data-cache ME SI state 
transitions during reads, see Table 5-9 on page 5-51. 

If all of the cache ways in which a potential line fill can be 
cached are already filled with valid entries, the processor 
selects a line to replace during the line fill. In the data cache, if 
the selected line is in the modified state, the processor writes 
the modified line back to memory before filling the vacated 
cache line with the new contents. 

If BUFF is asserted after the first eight bytes, BRDY and KEN 
of a cache-line fill are returned, the processor uses the first 
eight bytes but it does not cache them, and the line fill is 
aborted. When BUFF is negated, the entire bus cycle is 
restarted from the beginning and the system must again drive 
KEN in the same state that was sampled before the backoff. 
Thus, system logic cannot use BUFF to change the state of KEN 
and therefore the cacheability status of a line. 

Un the 486 processor, KEN is sampled twice (on the first and 
last transfer of a burst) and must be asserted at both times for 
a burst read to be treated as a cache-line fill. Un the AMD-K5 
and Pentium processors, however, KEN is sampled only on the 
first clock of a transfer, during BRDY or NA, whichever is first. 
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5.2.35 LOCK (Bus Lock) 

Output 

Summary The processor asserts LOCK during certain sequences of bus 

cycles that require integrity. To preserve the processor’s han- 
dling of these sequences, system logic should prevent other 
bus masters from intervening in locked cycles. 

Driven and Floated For locked operations, the processor asserts LOCK with ADS 

and holds it asserted until the last expected BRDY of the last 
bus cycle in the locked operation. The processor negates LOCK 
for at least one clock (called a dead or idle clock) between 
sequential locked operations. 

LOCK is driven during memory cycles and interrupt acknowl- 
edge operations in the normal operating modes (Real, Pro- 
tected, and Virtual-8086) and in SMM. LOCK is not driven or 
not meaningful during cache writethroughs or writebacks, I/O 
cycles, or special bus cycles; in the Shutdown, Halt, Stop Grant, 
or Stop Clock states; or while BUFF, HLDA, RESET, INIT, or 
PRDY is asserted. While AHOLD is asserted, LOCK is driven 
only to complete a locked cycle that had been initiated before 
AHOLD was asserted. 

The processor floats LOCK one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

Details The processor always locks the following types of memory 

operations: 

■ Interrupt Acknowledge Operations — These are a pair of read 
cycles used to obtain an interrupt vector in response to the 
assertion of INTR. 

■ Descriptor-Table Accesses — These involve segment descrip- 
tors in the global descriptor table (GDT), local descriptor 
table (LDT) or interrupt descriptor table (IDT) and occur in 
Protected mode. The processor performs them during a seg- 
ment load to ensure that the Accessed (A) bit in code and 
data descriptors is set to 1, or to test and set the Busy (B) bit 
in TSS descriptors. The sequence is as follows: (1) the pro- 
cessor drives an unlocked read of the descriptor to see if 
the relevant bit is set to 1, (2) if the bit is cleared to 0, the 
processor then drives a locked read-modify-write to set the 
bit to 1. During updates to the Accessed and Busy bits, the 
AMD-K5 processor drives a locked four-byte read and four- 
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byte write sequence. The Pentium processor, however, 
drives a locked eight-byte read and one-byte write 
sequence. 

Independent of these actions by the processor, the operat- 
ing system can clear the Accessed or Busy bits to 0 for book- 
keeping purposes. The operating system may do this 
however it wishes, but if locking is to be used for the mem- 
ory accesses it is the operating system’s responsibility to 
initiate locking with an XCHG or a LOCK instruction pre- 
fix. 

■ Page Directory and Page Table Accesses — The processor per- 
forms these accesses during each TLB miss to set the 
Accessed (A) bit to 1 in the relevant page directory and/or 
page table entry, and during each write access to set the 
Dirty (D) bit to 1 in the relevant page table entry, if those 
bits are not already set. These accesses work in a manner 
similar to descriptor table accesses, described immediately 
above, except that the operating system typically clears the 
Accessed and Dirty bits before the processor sets them, so 
that the operating system can thereafter identify pages that 
have been accessed and updated. 

■ XCHG Instruction — When XCHG is used to swap a register 
with a memory location, the access is unconditionally 
locked. 

■ LOCK Prefix — Applications programs can add the LOCK 
prefix to the following instructions if the destination oper- 
and resides in memory: ADC, ADD, AND, BT, BTC, BTR, 
BTS, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, and XCHG 
(redundant). The locking applies only to the bus cycle gen- 
erated by that single instruction. Other uses of the LOCK 
prefix generate an undefined opcode fault. 

Locked operations normally consist of pairs of bus cycles, typi- 
cally read followed by write, except in the case of interrupt 
acknowledge pairs which are read-read. If the locked cycles 
are misaligned, the processor runs multiple pairs of bus cycles, 
during which LOCK and SCYC are both asserted throughout. 
For example, a misaligned, locked, read-modify-write sequence 
appears on the bus as two read cycles followed by two write 
cycles. Thus, up to four bus cycles can occur for misaligned 
accesses. (The AMD-K5 processor runs certain misaligned bus 
cycles in the opposite order from the Pentium processor; see 
the description of SCYC on page 5-114 for details.) 
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The processor always negates LOCK for at least one idle clock 
between sequential locked operations. For example, if a read- 
modify-write is followed by another read-modify-write, there is 
an unlocked idle clock (sometimes called a dead clock) 
between the two sequences to allow system logic to reallocate 
the bus to another bus master. During this idle clock, the pro- 
cessor responds to all signals and pending interrupts. 

The processor responds to AHOLD and BOFF while LOCK is 
asserted, and it recognizes but does not respond to HOLD until 
the clock after the last BKDY of the locked operation. The pro- 
cessor recognizes all other inputs and outputs used for memory 
cycles except KEN, NA, and WB/WT. 

Inquire cycles can be driven while LOCK is asserted if AHOLD 
is used to obtain the bus for the inquire cycle. If such an 
inquire cycle occurs before the last write of the locked opera- 
tion and the inquire hits a modified cache location, the write- 
back is done in the middle of the locked operation between the 
two locked cycles with LOCK asserted during the writeback. 
System logic must recognize this case and know that the 
inquire cycle is snooping and writing back a different location 
than the one that is locked. 

Locked operations cannot be performed on cached locations, 
and an inquire cycle cannot hit a line that is involved in the 
locked operation. The processor prevents this by always check- 
ing its cache tags prior to a locked operation. If the location is 
cached, it is written back (if necessary) and invalidated prior 
to the locked operation. This policy is necessary to support reli- 
able semaphores for multiple caching devices, because such 
semaphores must never be cached and should only be accessed 
using locked operations. 

If system logic asserts AHOLD while the processor is complet- 
ing a locked cycle already begun before the assertion of 
AHOLD, the system must not allow accesses by other bus mas- 
ters to lock the same address that the processor is locking. 

If BOFF is asserted during a locked operation, only the cycle(s) 
aborted before their last BRDY and the cycles not yet run are 
restarted after BOFF is negated. Thus, system logic must keep 
track of all cycles in the locked operation that have completed 
before the assertion of BOFF and must continue the locked 
operation immediately after BOFF is negated, except that if a 
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writeback is pending when BUFF is negated, the writeback 
takes precedence over the restarting of the aborted cycles in 
the locked operation. 

For purposes of interrupts and exceptions, locked operations 
are treated by the processor as if the entire multi-cycle opera- 
tion were a single instruction. Thus, interrupts and exceptions 
are not recognized during locked operations. The processor 
samples BUSCHK if it is asserted with any BRDY of a locked 
operation, but the processor does not generate an enabled 
machine check interrupt for the BUSCHK until after the 
locked operation completes, and thus the exception will not 
intervene in the locked operation. If an edge-triggered inter- 
rupt (FLUSH, SMI, INIT, or NMI) is asserted during a locked 
operation, the interrupt is latched and recognized after the 
locked operation completes, even if the interrupt signal is not 
held asserted until the locked operation completes. 
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5.2.36 M/TO (Memory or I/O) 

Output 

Summary The processor drives M/IU to indicate whether it is accessing 

memory or I/O on the bus. The signal is driven at the same time 
as the other two cycle definition signals, D/C and W/R. A spe- 
cific encoding of D/C, M/IU, and W/R identifies one of several 
special bus cycles. 

Driven and Floated M/IU is driven and floated with the same timing as D/C. See the 

description of D/C on page 5-53. 

Details The processor accesses I/U when it executes an I/U instruction 

(any of the INx or UUTx instructions). The processor accesses 
memory when it fetches instructions or executes an instruction 
that loads or stores data. Accesses to memory-mapped I/U ports 
appear on the bus as memory accesses. 

Unly data (not code) can be read or written from the I/U 
address space; the cycle definition for an I/U code read (D/C = 
0, M/IU = 0, W/R = 0) defines an interrupt acknowledge cycle, 
and the cycle definition for an I/U code write (D/C = 0, M/IU = 
0, W/R = 1) defines a special bus cycle. 

The processor specifies all special bus cycles with D/C = 0, 
M/IU = 0, and W/R = 1. The cycles are then differentiated by 
BE7-BE0 and A31-A3. 
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5 . 2.37 

Summary 

Sampled 

Details 



NA (Next Address) 

Input 

The assertion of NA indicates that external memory is pre- 
pared for a pipelined cycle. 

The processor samples NA from one clock after ADS until the 
first expected BRDY of a bus cycle. 

NA is sampled during memory cycles and writethroughs in the 
normal operating modes (Real, Protected, and Virtual-8086) 
and in SMM. NA is not sampled during writebacks, I/O cycles, 
locked cycles, special bus cycles, or interrupt acknowledge 
operations; or in the Shutdown, Halt, Stop Grant, or Stop Clock 
states; or while BUFF, HLDA, RESET, INIT, or PRDY is 
asserted. While AHOLD is asserted, NA is sampled only to 
complete a bus cycle already begun before the assertion of 
AHOLD. 

NA is an input that is asserted when external memory is pre- 
pared to accept a pipelined cycle. The AMD-K5 processor 
drives the pending ADS two clocks after NA is sampled active. 
NA does not generate pipelined cycles when LOCK is asserted, 
during writeback cycles, or when there are no pending internal 
cycles. Furthermore, locked or writeback cycles are not pipe- 
lined. KEN and WB/WT are sampled when NA or BRDY is 
asserted, whichever comes first. 

Refer to the appropriate data sheet for model-specific details 
regarding the operation of NA. 
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5.2.38 

Summary 

Sampled 



Details 



NMI (Non-Maskable Interrupt) 

Input 

The assertion of NMI causes the processor to enter an interrupt 
service routine using a predefined interrupt vector. 

The processor samples NMI every clock and recognizes it at the 
next instruction boundary. NMI is a rising-edge-triggered inter- 
rupt and is latched when sampled. The signal must be negated 
for at least four clocks before being asserted. 

NMI is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, or interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; in the Shutdown, Halt, or Stop 
Grant states; or while AHOLD, BUFF, or HLDA is asserted. 
NMI is not sampled in the Stop Clock state, or while RESET, 
INIT, or PRDY is asserted. 

If INIT and NMI are both asserted during the Stop Grant state 
(not necessarily simultaneously), the AMD-K5 processor recog- 
nizes the INIT after leaving the Stop Grant state, then it recog- 
nizes the NMI prior to fetching any instructions. Current 
implementations of the Pentium processor do not recognize the 
NMI in such cases, although future implementations may. 

NMI is the sixth-highest-priority external interrupt. For details 
on its relationship to other interrupts and exceptions, see Sec- 
tion 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

NMI is normally used by system software to report errors such 
as parity, low battery, I/O channel check, board removal, time- 
out, and other system states that require operator attention. If 
such an error occurs, system software can, for example, display 
a screen message and wait for the operator to continue opera- 
tion, if possible. In this sense, the applications for NMI are sim- 
ilar to those for BUSCHK and the Shutdown state, although the 
three are not functionally related. In typical PC systems, the 
signal is controlled by a system software interrupt to BIOS or a 
write to an I/O port (such as port 61h and/or 92h). In spite of its 
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name, some PC systems allow the interrupt to be masked with 
a write to an I/O port (such as port 70h). 

Upon recognizing an NMI interrupt at the next instruction 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 

2. Service Interrupt — The processor saves its state and 
accesses vector 2 in the interrupt vector table (IVT) or 
interrupt descriptor table (IDT), depending on whether the 
processor is running in Real mode or Protected mode. The 
vector identifies a gate descriptor in the table. The IDT, for 
example, can contain interrupt, trap, or task gates, all of 
which point indirectly to the entry point of an interrupt ser- 
vice routine. 

The processor recognizes HUFF, HOLD, and AHOLD while 
NMI is asserted and these signals will intervene in the NMI ser- 
vice routine. The processor latches the assertion of any edge- 
triggered interrupt (FLUSH, SMI, INIT, NMI) while BUSCHK 
is asserted and recognizes latched interrupts in priority order 
when BUSCHK is negated. If NMI is asserted during the Stop 
Grant state, the signal is held pending until after the processor 
exits the Stop Grant state, at which point it is acted upon. 

During SMM, the Pentium processor does not respond to NMI 
until the beginning of its response to the first INTR or software 
interrupt (INTn) to occur after entering SMM. NMIs can thus 
be enabled by using a dummy interrupt. When an INTR or soft- 
ware interrupt is recognized, the processor first responds to a 
pending NMI interrupt before executing the first instruction of 
the INTR handler. By contrast, the AMD-K5 processor recog- 
nizes a pending NMI interrupt after returning (via the IRET 
instruction) from a prior interrupt. 
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5.2.39 PCD (Page Cache Disable) 

Output 

Summary The processor drives PCD to indicate the operating system’s 

specification of cacheability for the entire current page. Sys- 
tem logic can use PCD to control external caching. 

Driven and Floated The processor drives PCD from the clock in which ADS is 

asserted until the last expected BRDY of the bus cycle. 

PCD is driven during memory cycles (including cache 
writethroughs and writebacks) and locked cycles in the normal 
operating modes (Real, Protected, and Virtual-8086) and in 
SMM. While AHOLD is asserted, PCD is driven only to com- 
plete a bus cycle that had been initiated before AHOLD was 
asserted. PCD is not driven during special bus cycles, or inter- 
rupt acknowledge operations; or in the Shutdown, Halt or Stop 
Grant states, except for writebacks due to inquire cycles; and 
PCD is never driven during the Stop Clock state, or while 
BUFF, HLDA, RESET, INIT, or PRDY is asserted. 

The processor floats PCD one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

Details If PCD is negated during read misses, the page being accessed 

may or may not be cacheable, depending on the state of other 
signals. If PCD is asserted during any type of access, the page 
is noncacheable. The PCD output affects the processor’s cach- 
ing of data only during read misses. It has no effect on the pro- 
cessor during read hits, write misses, or write hits, as shown in 
Tables 5-17 and 5-18 on page 5-135. 

The state of the PCD output is a page-level specification of 
cacheability based on the state of several bits written by the 
operating system. In Protected mode, the PCD output specifies 
the cacheability of the entire page being accessed. The bits 
that determine the PCD output are stored in one of the proces- 
sor’s control registers or its TLB. Those bits include the cache 
disable (CD) bit in CRO, the paging enable (PG) bit in CRO, and 
the page cache disable (PCD) bit in one of three locations. The 
selection of bits depends on the processor’s operating mode 
and the type of access, as follows: 
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■ In Real mode, or in Protected and Virtual-8086 modes while 
paging is disabled (PG bit in CRO cleared to 0): 

PCD output = CD bit in CRO 

(Thus, whenever the CD bit in CRO is set to 1, the PCD out- 
put is asserted and the access is non-cacheable.) 

■ In Protected and Virtual-8086 modes while caching is 
enabled (CD bit in CRO cleared to 0) and paging is enabled 
(PG bit in CRO set to 1): 

For accesses to I/O space, page directory entries, and other 
non-paged accesses: 

PCD output = PCD bit in CR3 

For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 

PCD output = PCD bit in page directory entry 
For accesses to a 4-Kbyte pages: 

PCD output = PCD bit in page table entry 

The method of selecting the PCD bit is similar to that for the 
PWT bit, described on page 5-105. The cache disable (CD) and 
not-writethrough (NW) bits in CRO are cleared to 0 for normal, 
cacheable operation. If a location is already cached before the 
operating system sets a PCD bit to 1, any access to that location 
will hit in the cache regardless of the state of the PCD bit or 
signal. 

CACHE is partially determined by the PCD bit. Thus, the 
states of CACHE and PCD are very often the same. CACHE is 
never asserted when PCD is asserted. PCD indicates the cache- 
ability of an entire page, and CACHE indicates the burstability 
of a particular bus cycle; burstability is a necessary but insuffi- 
cient condition for determining cacheability. The cacheability 
of a particular bus cycle is determined during read cycles when 
system logic asserts KEN while the processor asserts CACHE. 
KEN not a factor in determining the state of the PCD or 
CACHE signals. The processor drives both PCD and CACHE 
before it knows the state of KEN. For details, see the descrip- 
tions of CACHE and KEN on pages 5-49 and 5-89. 
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5.2.40 

Summary 

Driven 

Details 



PCHK (Parity Status) 

Output 

The processor asserts PCHK during reads if it detects an even 
parity error on one or more bytes of D63-D0 during a read 
cycle. 

The processor drives PCHK for one clock, two clocks after each 
BRDY during read cycles. 

PCHK is driven for memory and I/O reads, locked reads, and 
interrupt acknowledge operations in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM, or 
while PRDY is asserted. PCHK is not driven during any type of 
write cycles or special bus cycles; or during the Shutdown, 
Halt, Stop Grant, or Stop Clock states; or while BUFF, HLDA, 
RESET, or INIT is asserted. While AHOLD is asserted, PCHK 
is driven only to complete a bus cycle already begun before the 
assertion of AHOLD. 

To determine data parity, the bit value driven on DP7-DP0 is 
considered with the bit values driven on D63-D0. If the total 
number of 1 bits is even for DP7-DP0 and D63-D0, the byte is 
considered free of error (thus the term even parity). If the num- 
ber of 1 bits is odd, the byte is considered to have an error. 
During burst reads, the processor checks all eight bytes of 
D63-D0 for errors, with respect to the even parity bit sampled 
on DP7-DP0. During single-transfer reads, only the enable 
bytes on D63-D0 and the enabled parity bits on DP7-DP0 (as 
specified by BE7-BE0) are checked. 

If PEN is asserted during the BRDY for a read cycle, and the 
processor reports a data parity error on PCHK for that cycle, 
the processor latches the physical address and cycle definition 
of the failed bus cycle and (optionally) generates a machine 
check exception. See the description of PEN on page 5-102 for 
details. 

If an error is reported on PCHK, the system must nevertheless 
return all remaining BRDYs for that bus cycle — one BRDY for 
single-transfer cycles and four BRDYs for burst cycles. Systems 
that do not implement data parity generation and checking 
should tie DP7-DP0 either High or Low and ignore the PCHK 
output. 
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5.2.41 

Summary 

Sampled 

Details 



PEN (Parity Enable) 

Input 

System logic can assert PEN to enable cycle information latch- 
ing and (optionally) machine check exception generation for 
data bus parity errors during read cycles. 

The processor samples PEN every BRDY during read cycles. 

PEN is sampled for memory and I/O reads, locked reads, and 
interrupt acknowledge operations in the normal operating 
modes (Real, Protected, and Virtual-8086) and in SMM, or 
while PRDY is asserted. PEN is not sampled during any type of 
write cycles or special bus cycles; or during the Shutdown, 
Halt, Stop Grant, or Stop Clock states; or while BUFF, HLDA, 
RESET, or INIT is asserted. While AHOLD is asserted, PEN is 
sampled only to complete a bus cycle already begun before the 
assertion of AHOLD. 

If PEN is asserted when a data parity error is reported on 
PCHK, the processor latches the physical address and cycle 
definition of the failed bus cycle in its 64-bit machine check 
address register (MCAR) and its 64-bit machine check type 
register (MCTR). These registers can be read with the RDMSR 
instruction. See Section 3.3.5 on page 3-33 for details on this 
instruction. 

In addition to latching the cycle address and definition, the 
processor also generates a machine check exception (12h) if 
the MCE bit in CR4 is set to 1 while PEN is asserted. System 
logic must then handle the error externally. Typical PC sys- 
tems provide a mechanism for asserting NMI during a parity 
error. 

If PEN is negated, neither the address and cycle definition 
latching nor the machine check exception generation occur. 

The MCE bit in CR4 also enables the generation of a machine 
check exception during bus cycle errors that are indicated on 
the BUSCHK input. The machine check mechanism is not, how- 
ever, used for address parity errors indicated on APCHK. 
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5 . 2.42 

Summary 

Driven 



Details 



PRDY (Probe Ready) 

Output 

The processor asserts PRDY to acknowledge the system logic’s 
assertion of R/5 or execution of the Test Access Port (TAP) 
instruction, USEHDT, and to indicate the processor’s entry 
into the Hardware Debug Tool (HDT) mode for debugging. 

The processor drives PRDY every clock in response to either 
R/5 or the TAP instruction, USEHDT. The processor asserts 
PRDY at the next instruction boundary after R/5 is sampled 
Low or when the USEHDT instruction is executed. The latter 
causes the processor to assert PRDY without a transition on 
R/5. After PRDY is asserted by either means, the processor 
negates PRDY on the later of (a) the clearing of the TAP 
instruction register, (b) a TAP reset, or (c) after a Low-to-High 
transition on R/5. 

PRDY is driven in memory cycles (including writethroughs and 
writebacks), cache accesses, and I/O cycles in the normal oper- 
ating modes (Real, Protected, and Virtual-8086) and in SMM; 
in the Shutdown, Halt or Stop Grant states; or while AHOLD, 
BUFF, HLDA, or RESET is asserted. PRDY is not driven dur- 
ing locked cycles, special bus cycles, or interrupt acknowledge 
operations; during the Stop Clock state; or while INIT is 
asserted. 

The HDT is entered either when external debug logic drives 
R/5 Low or loads the TAP instruction register with the USE- 
HDT instruction. If R/5 is used to initiate the HDT, the debug 
logic must hold R/5 Low throughout the debug session. If the 
USEHDT instruction is used to initiate the HDT, the processor 
asserts PRDY without a transition on R/5. 

The processor negates PRDY and begins fetching instructions 
for normal operation one clock after a Low-to-High transition 
on R/5, or when the TAP instruction register is cleared, or the 
TAP is reset. 

Debug software can force the processor into SMM, but the pro- 
cessor does not recognize SMI or any other interrupts while 
PRDY is asserted. If system hardware or software wishes to 
assert RESET, it must exit the HDT before asserting RESET. 
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Documentation on the HDT is available under nondisclosure 
agreement to test and debug developers. For information, con- 
tact your AMD sales representative or field application engi- 
neer. 
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5.2.43 PWT (Page Writethrough) 

Output 

Summary The processor drives PWT to indicate the operating system’s 

specification of writeback or writethrough state for the entire 
current page. PWT, together with WB/WT, specifies the data- 
cache ME ST state of cacheable read misses and write hits. 

Driven and Floated The processor drives PWT from the clock in which ADS is 

asserted until the last expected BRDY of the bus cycle. 

PWT is driven during memory cycles (including cache 
writethroughs and writebacks), and locked cycles in the nor- 
mal operating modes (Real, Protected, and Virtual-8086), and 
in SMM, and when PRDY is asserted. If AHOLD is asserted, 
PWT is driven only to complete a bus cycle that had been initi- 
ated before AHOLD was asserted. PWT is not driven during 
special bus cycles or interrupt acknowledge operations; or in 
the Shutdown, Halt or Stop Grant states, except for writebacks 
due to inquire cycles; and PWT is never driven during the Stop 
Clock state, or while BUFF, HLDA, RESET, or INIT is asserted. 

The processor floats PWT one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

Details As Table 5-14 shows, lines in the modified or exclusive ME ST 

state are said to be in the writeback state, which corresponds to 
PWT = 0. Lines in the shared ME SI state are said to be in the 
writethrough state, which corresponds to PWT = 1. 



Table 5-14. PWT, Writeback/Writethrough, and MESI 



MESI State 


Writeback/Writethrough State 


PWT State 


modified 


writeback 


0 


exclusive 


writeback 


0 


shared 


writethrough 


1 


invalid 


invalid 


- 



System logic can use PWT output, along with its WB/WT input, 
to determine how the processor will control internal caching. 
Tables 5-17 and 5-18 on page 5-135 show how the state of PWT 
and WB/WT determine the MESI state of a line in the data 
cache after a cache-line fill or writeback. If WB/WT is Low or 
PWT is High during a read miss or a write hit to a shared line, 
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the accessed line is cached in, transitions to, or remains in the 
shared state after the access. If PWT is Low and WB/WT is 
High, the accessed line is cached in, transitions to, or remains 
in the exclusive state after a read miss or the first write hit. A 
subsequent write to an exclusive line changes it to modified. 

The state of the PWT output is based on the state of several 
bits written by the operating system. In Protected mode, the 
PWT output applies to the entire current page rather than to 
the specific bus cycle that the WB/WT output applies to, and it 
is the operating system’s (rather than the processor hard- 
ware’s) determination of writeback or writethrough state. 

The bits that determine the PWT output are stored in a proces- 
sor control register or the TLB. Those bits include the paging 
enable (PG) bit in CRO and the page writethrough (PWT) bit in 
one of three locations. The selection of bits depends on the pro- 
cessor’s operating mode and the type of access, as follows: 

■ In Real mode, and in Protected and Virtual-8086 modes 
while paging is disabled (PG bit in CRO cleared to 0): 

PWT output = Low (writeback) 

■ In Protected and Virtual-8086 modes while paging is 
enabled (PG bit in CRO set to 1): 

For accesses to I/O space, page directory entries, and other 
non-paged accesses: 

PWT output = PWT bit in CR3 

For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 

PWT output = PWT bit in page directory entry 
For accesses to a 4-Kbyte pages: 

PWT output = PWT bit in page table entry 

The method of selecting the PWT bit is similar to that for the 
PCD bit as described on page 5-99. The cache disable (CD) and 
not-writethrough (NW) bits in CRO are cleared to 0 for normal, 
cacheable operation. 

In the Hardware Debug Tool (HDT) mode, PWT is only mean- 
ingful for cache write misses (PWT = 0 and WB/WT = 1 transi- 
tion a shared line to an exclusive line). The signal is not 
meaningful during cache read misses in HDT mode, because 
the caches are never filled during HDT mode. 
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5.2.44 R/S (Run or Stop) 

Input 

Summary External hardware and software use R/S to control entry into 

and exit from the Hardware Debug Tool (HDT) mode, which 
supports access to the processor’s DR7-DR0 debug registers 
through an external debug port. The AMD-K5 processor imple- 
ments the HDT in a manner different than the Pentium proces- 
sor’s Probe mode. 

The processor samples R/S every clock and recognizes it at the 
next instruction boundary. R/S is a level-sensitive interrupt 
with an internal pullup resistor. It must be held asserted until 
recognized. When recognized, the processor acknowledges R/S 
by asserting PRDY at the next instruction boundary. 

R/S is sampled during memory cycles (including writethroughs 
and writebacks), cache accesses, and I/O cycles in the normal 
operating modes (Real, Protected, and Virtual-8086) and in 
SMM; in the Shutdown, Halt or Stop Grant states; or while 
AHOLD, BUFF, HLDA, RESET, or INIT is asserted. R/S is not 
sampled during locked cycles, special bus cycles, or interrupt 
acknowledge operations; or during the Stop Clock state. 

R/S is the second-highest-priority external interrupt. For 
details on its relationship to other interrupts and exceptions, 
see Section 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

Test logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

Details The Hardware Debug Tool (HDT) — sometimes referred to as 

the Debug Port or Probe mode — is a collection of signals, regis- 
ters, and processor microcode that is enabled when external 
debug logic drives R/S Low or loads the processor’s Test Access 
Port (TAP) instruction register with the USEHDT instruction. 

At the next instruction retirement boundary after system 
debug logic drives R/S Low or loads the TAP instruction regis- 
ter with the USEHDT instruction, the processor performs the 
following actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 



Sampled and 
Acknowledged 
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2. Acknowledge — The processor asserts PRDY to acknowledge 
the interrupt and mark entry into the HDT mode. The pro- 
cessor does not save its state before asserting PRDY 
because it will continue execution at the next instruction 
after returning from the debug session, when R/S and 
PRDY are negated. 

If R/S is used to initiate the HDT, the debug logic must hold R/ 
5 Low throughout the debug session. The processor negates 
PRDY and begins fetching instructions for normal operation 
one clock after a Low-to-High transition on R/S, or when the 
TAP instruction register is cleared or the TAP is reset. 

The processor recognizes AHOLD, BUFF, and HOLD while R/S 
is Low, and these signals will intervene in the HDT mode when 
PRDY is asserted. However, exceptions or interrupts are not 
recognized in the HDT mode. The processor latches the asser- 
tion of any edge-triggered interrupt (FLUSH, SMI, INIT, NMI) 
during the HDT mode and recognizes them in priority order 
when PRDY is negated. See Table 5-3 on page 5-16 for the pri- 
ority of interrupts and exceptions. 

Documentation on the HDT is available under non-disclosure 
agreement to test and debug developers. For information, con- 
tact your AMD sales representative or field application engi- 
neer. 

The AMD-K5 processor implements the HDT mode in a manner 
different than the Pentium processor’s Probe mode. For details 
on the processor’s PRDY acknowledgment to R/S, see page 5- 
103. For details on TAP testing, see Section 7.8 on page 7-19. 
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5.2.45 

Summary 

Sampled 



Details 



RESET (Reset) 

Input 

The assertion of RESET initializes the processor to the power- 
up state. 

The processor samples RESET every clock and recognizes it at 
the next instruction boundary. The RESET process begins at 
the falling edge of RESET. To be recognized, RESET must be 
held asserted for at least 1 ms after V cc and CLK reach specifi- 
cation. 

The following inputs are sampled on the falling edge of 
RESET: 

■ BF (BF1-BF0) is(are) sampled to select the frequency ratio 
between the processor’s internal clock and the bus clock 
(CLK). 

■ If FLUSH is asserted, the processor invokes the three-state 
(float) test. 

■ If FRCMC is asserted, the processor enters Functional- 
Redundancy Checking mode as the checker. 

■ If INIT is asserted, the processor performs its built-in self 
test (BIST) before initialization and code fetching begin. 

The processor samples RESET at all times, except in the Stop 
Clock state and while INIT or PRDY is asserted. System logic 
can drive the signal either synchronously or asynchronously 
(see the data sheet for synchronously driven setup and hold 
times). 

RESET is typically asserted at power-up by a power-good sig- 
nal from the power supply, which is turned on by a hardware 
switch. RESET can also be asserted after power-up. For exam- 
ple, pressing a front-panel button can cause a BIOS interrupt to 
write to an I/O port (such as port 64h in the keyboard control- 
ler). After RESET, the operating system usually determines 
the cause of the reset (reset during or after power-up) with 
another BIOS interrupt that queries another I/O port (such as 
location OFh in the CMOS memory at ports 70 and 71h), and it 
uses this information to determine whether a full power-on test 
(POST) of the system should be run. 
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Starting at the falling edge of a recognized RESET, the proces- 
sor performs the following actions, in the order shown: 

1. Flush Pipeline — The processor invalidates the: 

• Instruction pipeline 

2. Reinitialize — The processor reinitializes the following 
resources to reset values: 

• General-purpose registers 

• System registers 

• Floating-point registers 

• Model-specific registers (MSRs) 

• Data-cache tag directory (linear and physical) and data 
array. No writebacks are performed. 

• Instruction-cache tag directory (linear and physical) and 
instruction array 

• Translation look-aside buffer (TLB) 

• Branch-prediction bits 

• Clears the interrupt flag (IF) in EFLAGS to 0 

3. Jump To BIOS — The processor jumps to physical address 
FFFF_FFFOh, the same entry point used after INIT, where 
it expects to find the BIOS entry point. 

The contents of AMD-K5 processor registers at the conclusion 
of RESET or INIT is identical to that of the Pentium processor, 
except that the CPU ID in EDX is 0000_050xh. The upper byte 
of DX (DH) contains 05h and the lower byte of DX (DL) con- 
tains Oxh, the processor’s type and stepping identifier. 

Table 5-15 shows the contents of registers after RESET or 
INIT. Table 5-16 shows the state of the processor’s outputs 
after RESET. 



Table 5-15. Register State After RESET or INIT 



Register 


Contents (hex) 


EIP 


FFFF_FFF0 


EFLAGS 


0000_0002 


EAX 


0000_0000 


EBX 


0000_0000 


ECX 


0000_0000 
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EDX 


0000_050x 


ESI 


0000_0000 


EDI 


0000_0000 


EBP 


0000_0000 


ESP 


0000_0000 


FPU Stack R7-R0 


0000_0000_0000_0000_0000 


FPU Exception Pointer 


0_0000_0000_0000 


CS 


FOOO 


SS 


0000 


DS 


0000 


ES 


0000 


FS 


0000 


GS 


0000 


GDTR 


base:0000_0000 limit:0000 


IDTR 


base:0000_0000 limit:0000 


TR 


0000 


LDTR 


0000 


CRO 


6000_0010 


CR2 


0000_0000 


CR3 


0000_0000 


CR4 


0000_0000 


DR7 


0000_0400 


DR6 


FFFF_OFFO 


DR3 


0000_0000 


DR2 


0000_0000 


DR1 


0000_0000 


DRO 


0000_0000 
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Table 5-16. Outputs at RESET 



Output 


RESET State 


ADS 


1 


A3 1 -A3 


Floating 


APCHK 


1 


BE7-BE0 


FFh 


BREQ 


1 


BRDY 


1 


BRDYC 


1 


CACHE 


1 


D/C 


0 


D63-D0 


Floating 


DP7-DP0 


OOh 


FERR 


1 


HIT 


1 


HUM 


1 


HLDA 


0 


COCK 


1 


M/ID 


0 


PCD 


0 


PCHK 


1 


PRDY 


0 


PWT 


0 


W/R 


0 



Unlike INIT, RESET reinitializes the processor’s entire state. 
In particular, RESET differs by reinitializing the contents of 
the caches, floating-point registers, control registers, and 
model-specific registers, as well as all other states that are 
reinitialized by INIT. 

A20M should not be asserted during RESET. The operating 
system alone is responsible for controlling the state of A20M 
by writing to an external register provided for this purpose. 
(See the description of A20M on page 5-18.) 

Because the processor boots in Real mode, the memory address 
decoder must alias the physical address FFFF_FFFOh to the 
physical address 000F_FFF0h, which lies within the 1-Mbyte 
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address limit required in Real mode. (The physical address 
000F_FFF0h is sometimes written in the selector -.offset format 
as F000:FFF0.) This reset address behavior of the x86 architec- 
ture is due to the special way in which segment translation is 
performed on reset. Normally, a Real-mode 16-bit segment 
selector is shifted left 4 bits (one hex digit) to form the seg- 
ment base, and then added to the 16-bit offset. Thus, 
F000:FFF0 in the selector -.offset format becomes a segment base 
of FOOOOh added to an offset of FFFOh, yielding the physical 
address 000F_FFF0h. When RESET is asserted, however, the 
left shift is not done and the high 16 address bits are all set to 
1, yielding the physical address FFFF_FFFOh. Thereafter, 
address translation only begins to work in the normal Real- 
mode manner when the first far jump is executed. This jump 
loads the code segment register with a 16-bit segment selector. 
This code segment load causes the address translation mecha- 
nism to begin working normally. The system logic address 
decoder must make this behavior transparent to software by 
aliasing the physical address FFFF_FFFOh to the physical 
address 000F_FFF0h. 

The processor recognizes AHOFD, BUFF, and HOED while 
RESET is asserted, but these signals will not intervene in the 
initialization process except that they will prevent the first 
code fetch (jump to BIOS) after the registers are initialized. 

While RESET is asserted, the processor recognizes or drives 
only BF (BF1-BF0), FLUSH, FRCMC, the hold signals 
(AHOLD, BUFF, HOLD, and HLDA), INIT, and R/5. 

Unlike the Pentium processor, the AMD-K5 processor does not 
recognize RESET in the Hardware Debug Tool (HDT) mode. 
System hardware or software must exit the HDT (by driving R/ 
5 High) before asserting RESET. 
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5.2.46 SCYC (Split Cycle) 

Output 

Summary The processor asserts SCYC during misaligned, locked trans- 

fers on the D63-D0 data bus. The processor generates addi- 
tional bus cycles to complete the transfer of misaligned data. 

Driven and Floated The processor drives SCYC from the clock in which ADS is 

asserted until the last expected BRDY of the bus cycle. 

SCYC may be driven during any memory and I/O cycles, 
whether locked or not, but it is only meaningful during locked 
memory cycles in the normal operating modes (Real, Pro- 
tected, and Virtual-8086) and in SMM. SCYC is not driven or is 
not meaningful during unlocked memory cycles, I/O cycles, 
inquire cycles, special bus cycles, or interrupt acknowledge 
operations; in the Shutdown, Halt, Stop Grant, or Stop Clock 
states; while BUFF, HLDA, RESET, or INIT is asserted; or 
while PRDY is asserted. While AHOLD is asserted, SCYC is 
driven only to complete a locked memory cycle already begun 
before the assertion of AHOLD. 

The processor floats SCYC one clock after system logic asserts 
BUFF and in the same clock that the processor asserts HLDA. 

Details Lor purposes of bus cycles, the term aligned means: 

■ 2- and 4-byte transfers lie within 4-byte address boundaries 

■ 8-byte transfers lie within 8-byte address boundaries 

(Lor purposes of exceptions, the term aligned means situated 
on the natural boundaries of an instruction or operand. Thus, a 
2-byte transfer that crosses a 2-byte address boundary may 
incur an alignment exception, but it will be performed as an 
aligned bus cycle.) 

If data on D63-D0 is misaligned, the processor generates addi- 
tional bus cycles to complete the transfer. Lor example, if a 4- 
byte transfer begins at address x07h, one byte is transferred 
during the first bus cycle and the remaining three bytes are 
transferred during a second bus cycle, which normally occurs 
immediately after the first bus cycle (unless intervened, such 
as by an interrupt or bus backoff). If the misaligned transfer is 
run as a locked cycle, the processor asserts both LUCK and 
SCYC throughout the misaligned sequence of bus cycles. 
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If memory reads, memory writes, or I/O reads are misaligned, 
the AMD-K5 processor runs the bus cycles in the opposite 
order of the Pentium processor. The AMD-K5 processor trans- 
fers the low-address portion followed by the high-address por- 
tion instead of the high-address portion followed by the low- 
address portion. 

I/O writes, however, are performed in the same order on both 
processors. 
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5.2.47 SMI (System Management Interrupt) 

Input 

Summary The assertion of SMI causes the processor to enter System 

Management Mode (SMM). In this mode, which can be trans- 
parent to standard system and application software, an SMM 
interrupt service routine accesses a memory space separate 
from main memory. SMM is most commonly used for power 
management, although it is not limited to these functions. 

The processor samples SMI every clock and recognizes it at the 
next instruction boundary. SMI is a falling-edge-triggered 
interrupt with an internal pullup resistor. It is latched when 
sampled. When recognized, SMI is acknowledged with SMI- 
ACT after the later of (a) the last expected BRDY of any in- 
progress bus cycle, or (b) the assertion of EWBE with or follow- 
ing the last expected BRDY. SMI must be negated for at least 
four clocks before being asserted. It must be asserted at least 
three clocks before BRDY if it is to be recognized on the 
instruction boundary associated with that BRDY. 

SMI is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; in the Shutdown, Halt, or Stop 
Grant states; or while AHOLD, BUFF, or HE P A is asserted. 
SMI is not sampled in the Stop Clock state, or while RESET, 
INIT, or PRDY is asserted. 

SMI is the fourth-highest-priority external interrupt. For 
details on its relationship to other interrupts and exceptions, 
see Section 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously-driven setup 
and hold times). 

Details SMI is typically driven by a power management block of sys- 

tem logic that monitors activity on processor outputs, such as 
the address and cycle definition signals in conjunction with a 
timer. An SMM interrupt service routine in firmware controls 
events during SMM. The most common applications involve 
power management via clock and/or I/O device control. For 
example, the external power management logic may notice 
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that an I/O device has not been accessed for several minutes. 
The power management logic can then assert SMI, and the 
SMM service routine can obtain relevant information from the 
power management logic with which to make power-down deci- 
sions under program control. These decisions can be communi- 
cated back to the power management logic, which in turn can 
power the I/O device down and assert STPCLK to the proces- 
sor. 

Upon recognizing an SMI interrupt at the next instruction 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 

2. Complete In-Progress Cycle — If the processor had begun a 
bus cycle when SMI was asserted, the processor completes 
the bus cycle and waits until the system asserts the last 
expected BRDY and also asserts EWBE. 

3. Acknowledge — After sampling EWBE asserted, the proces- 
sor asserts SM1ACT to acknowledge the interrupt. At that 
point, system logic must ensure that all memory accesses 
during SMM are to the SMM memory space. 

4. Save Processor State — The processor saves its state in a 512- 
byte SMM state-save area at the top of the 32-Kbyte SMM 
memory area, starting at default physical location 
0003_FFFFh and filling down. 

5. Disable Interrupts and Debug Traps — The processor disables 
maskable interrupts by clearing the interrupt flag (IF) in 
EFLAGS, disables NMI interrupts, clears the trap flag (TF) 
in EFLAGS, and clears the DR7-DR6 debug control and sta- 
tus registers. 

6. Service Interrupt — The processor jumps to the entry point of 
the SMM service routine at the SMM base physical address, 
whose default is 0003_8000h in SMM memory. The SMM 
base address can be rewritten with another address while 
the processor is in SMM. The new address is written to the 
SMM base slot in the SMM state-save area and is stored 
internally in the processor. 

The processor does not assert SMIACT until it sees EWBE 
asserted. This ensures that any write data in external write 
buffers is written to the proper memory space (main memory, 
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not SMM memory) before address decoding switches memory 
references to the SMM memory space. If no bus cycle is in 
progress when SMI is asserted, or if the system does not imple- 
ment external write buffers, system logic may assert EWBE at 
the same time as SMI or at some later time. If a bus cycle is in 
progress when SMI is asserted, EWBE must be asserted with 
the last expected BRDY or later. 

The default physical location for the 64-Kbyte SMM memory 
area is between 0003_0000h and 0003_FFFFh, of which a mini- 
mum 32-Kbyte region between 0003_8000h and 0003_FFFFh 
must be populated with RAM. The memory controller normally 
uses the processor’s assertion of SMIACT to enable SMM mem- 
ory. The BIOS and system logic typically remap the SMM mem- 
ory area from its default location in low memory to high or 
extended memory. System logic must ensure that, during 
SMM, all memory accesses are to this SMM memory area or a 
remapped location. 

In general, system designs that do not overlap the address 
space of SMM memory and main memory are simpler and may 
perform better. However, if SMM memory space overlaps main 
memory space that is cacheable, FLUSH must be asserted 
when SMI is asserted so that memory accesses in SMM do not 
hit locations cached from main memory. The FLUSH is per- 
formed first, because it is a higher-priority interrupt. 

If SMM memory is to be cacheable, FLUSH should also be 
asserted with SMI when entering SMM, and the SMM service 
routine should execute the WBINVD instruction to invalidate 
the caches when leaving SMM, just prior to executing the RSM 
instruction. If SMM memory is to be noncacheable, KEN must 
be negated when FLUSH and SMI are asserted. 

SMM addresses and operands default to 16 bits, addresses are 
translated in the same manner as in Real mode, and the full 4 
Gbytes can be accessed without a segment limit violation. 
Unlike the Pentium processor, the AMD-K5 processor does not 
recognize A20M in SMM. The processor exits SMM (that is, the 
SMM service routine) when it executes the RSM instruction. 
This instruction causes the processor to copy the contents of 
the SMM state-save area into the processor’s registers and 
flush the instruction pipeline. Then, the processor continues 
executing instructions at the location specified by the CS:EIP 
value from the state-save area (which will be where the proces- 
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sor left off when it recognized SMI, unless the value is altered 
by the SMM service routine). 

If the assertion of SMI was recognized on the boundary of an 
I/O instruction, the I/O trap restart feature of SMM can option- 
ally be used to restart the I/O instruction when returning from 
SMM. The SMM service routine can implement this restart fea- 
ture by writing the value OOFFh into the I/O trap restart slot of 
the SMM state-save area. If the value is OOFFh (rather than its 
default, OOOOh) upon return from SMM, the processor decre- 
ments the instruction pointer and re-executes the I/O instruc- 
tion. This is useful, for example, if an I/O write to disk finds the 
disk powered down. The external power management logic 
monitoring such an access can assert SMI. In this case, the 
SMM service routine would query power management logic, 
find a failed I/O write, take action to power up the I/O device, 
enable the I/O restart feature by writing the value FFh into the 
I/O trap restart slot, and return. 

During a simultaneous SMI I/O trap (for I/O instruction restart) 
and debug breakpoint trap, the AMD-K5 processor responds to 
the SMI first and postpones writing the exception-related 
information to the stack until after the return from SMM via 
the RSM instruction. (If debug registers DR3-DR0 are used in 
SMM, they must be saved and restored by the SMM software; 
the processor automatically saves and restores DR7-DR6.) If 
the I/O trap restart slot in the SMM state-save area is written 
with the value FFh when the RSM instruction is executed, the 
debug trap does not occur until after the I/O instruction is re- 
executed. 

The processor recognizes AHOLD, BUFF, and HOLD while 
SMIACT is asserted and these signals will intervene in the 
SMM service routine. After assertion of SMI, subsequent asser- 
tions of SMI are masked so as to prevent recursive entry into 
SMM. Any other type of exception or interrupt, however, will 
intervene in the SMM service routine, although the INTR and 
NMI interrupts are managed in a special way as described in 
the paragraph below. If SMI is asserted during the Stop Grant 
state, the signal is held pending until after the processor exits 
the Stop Grant state, at which point it is acted upon. 

When SMM is entered, the processor disables both INTR and 
NMI interrupts. On both the AMD-K5 and Pentium processors, 
INTR interrupts are disabled by clearing the IF flag in 
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EFLAGS. But the mechanism by which NMI interrupts are dis- 
abled and subsequently recognized differs between the 
AMD-K5 and Pentium processors. 

During SMM, the Pentium processor does not respond to NMI 
until the beginning of its response to the first INTR or software 
interrupt (INTn) to occur after entering SMM. NMIs can thus 
be enabled by using a dummy interrupt. When an INTR or soft- 
ware interrupt is recognized, the processor first responds to a 
pending NMI interrupt before executing the first instruction of 
the INTR handler. By contrast, the AMD-K5 processor recog- 
nizes a pending NMI interrupt after returning (via the IRET 
instruction) from a prior interrupt. 

The same dummy interrupt used on the Pentium processor to 
enable NMI recognition during SMM works on the AMD-K5 
processor. The only difference is that the AMD-K5 processor 
responds to the NMI after the IRET of the dummy interrupt 
whereas the Pentium processor responds at the beginning of 
the dummy interrupt. 

During debugging using the R/S and PRDY protocol, the 
debugger can force the processor into SMM but the processor 
will not recognize SMI in the Hardware Debug Tool (HDT) 
mode. 

For further details on the System Management Mode, see 
Chapter 6. 
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5.2.48 

Summary 

Driven 



Details 



SMIACT (System Management Interrupt Active) 

Output 

The processor acknowledges the assertion of SMI with the 
assertion of SMIACT. The acknowledgment signifies the pro- 
cessor’s readiness to enter System Management Mode (SMM) 
and begin executing the service routine for that interrupt 
mode. 

The processor drives SMIACT from after the later of (a) the 
last expected BRDY of any in-progress bus cycle, or (b) the 
assertion of EWBE with or following the last expected BRDY, 
until the return from the SMM interrupt handler via the RSM 
instruction. 



SMIACT is driven during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; in the Shutdown, Halt, or Stop 
Grant states; or while AHOLD, BUFF, HLDA, or PRDY is 
asserted. SMIACT is not driven in the Stop Clock state, or 
while RESET is asserted. 

The memory controller normally uses the assertion of SMIACT 
to enable SMM memory, so that the first memory access in 
SMM is to the base of the state-save area in the SMM memory 
space. 

The processor remains in SMM, continuing to assert SMIACT, 
until it executes the RSM instruction. For more information 
regarding SMM, see the description of SMI on page 5-116, Sec- 
tion 6.1.4 on page 6-5, and Section 6.3 on page 6-23. 
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5.2.49 STPCLK (Stop Clock) 

Input 

Summary The assertion of STPCLK causes the processor to complete any 

in-progress bus cycle and enter the Stop Grant state (proces- 
sor’s internal clock stopped), from which it can subsequently 
transition to the Stop Clock state (bus clock stopped). These 
low-power clock states can be entered from the normal operat- 
ing modes, system management mode (SMM), or the Halt state. 

The processor samples STPCLK every clock and recognizes it 
at the next instruction boundary. STPCLK is a level-sensitive 
interrupt with an internal pullup resistor. The signal must be 
held asserted until recognized. When STPCLK is recognized 
and EWBE is asserted, the processor acknowledges it by driv- 
ing a Stop Grant special bus cycle, waits for BRDY, then stops 
its internal clock and floats D63-D0 and DP7-DP0. 

STPCLK is sampled during memory cycles (including cache 
writethroughs and writebacks), cache accesses, I/O cycles, 
locked cycles, special bus cycles, and interrupt acknowledge 
operations in the normal operating modes (Real, Protected, 
and Virtual-8086) and in SMM; or in the Shutdown, Halt, or 
Stop Grant states. STPCLK is not sampled in the Stop Clock 
state, or while RESET, INIT, or PRDY is asserted. STPCLK is 
not meaningful if it is asserted while AHOLD, BUFF, or HLDA 
is asserted, because the processor cannot drive the Stop Grant 
special bus cycle. 

STPCLK is the lo west-priority external interrupt. For details 
on its relationship to other interrupts and exceptions, see Sec- 
tion 5.1.3 on page 5-13 and Table 5-3 on page 5-16. 

System logic can drive the signal either synchronously or asyn- 
chronously (see the data sheet for synchronously driven setup 
and hold times). 

Details In typical PC systems that implement power control, the STP- 

CLK, CLK, and SMI signals are driven by external power man- 
agement logic. This logic monitors activity on the address and 
cycle definition signals. In a typical case, the power manage- 
ment logic may notice that, after having initiated SMM to 
power down one or more I/O devices, another several minutes 
have elapsed without activity. Power management logic can 
again assert SMI, the SMM service routine would obtain the 
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relevant information and decide to power itself (the processor) 
down, and the decision would be communicated to the power 
management logic, which would assert STPCLK to the proces- 
sor and, optionally, stop driving CLK to the processor and 
other logic. 

Upon recognizing a STPCLK interrupt at the next instruction 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. 

2. Complete In-Progress Cycle — If the processor had begun a 
bus cycle or locked operation when STPCLK was asserted, 
the processor completes the bus cycle and waits until the 
system asserts the last expected BRDY and also asserts 
EWBE. If no bus cycle is in progress, system logic must 
assert EWBE at the same time or at some time after it 
asserts STPCLK. 

3. Acknowledge — After sampling both EWBE asserted, the pro- 
cessor drives a Stop Grant special bus cycle. This cycle is 
identified by D/C = 0, M/IU = 0, W/R = 1, BE7-BE0 = FBh 
and A31-A3 = lOh. System logic must respond with BRDY. 

4. Stop Internal Clock — When system logic returns BRDY for 
the Stop Grant special bus cycle, the processor stops its 
internal clock and floats D63-D0 and DP7-DP0. 

5. (Optional) Stop Bus Clock — After returning BRDY in 
response to the Stop Grant special bus cycle, power man- 
agement logic can transition to the Stop Clock state by stop- 
ping CLK while STPCLK is held asserted. This reduces 
power consumption to its minimum. 

STPCLK must be held asserted throughout the Stop Grant and 
(if entered) Stop Clock states. Within less than 10 clocks after 
STPCLK is negated, the processor returns to the state from 
which it entered Stop Grant and can recognize any latched 
interrupts or drive ADS. 

The processor enters the Halt state from the normal operating 
modes (Real, Protected or Virtual-8086) or SMM when it exe- 
cutes the HLT instruction. The processor leaves the Halt state 
and returns to its prior operating mode when RESET, SMI, 
INIT, NMI, or INTR is asserted. If STPCLK is asserted within 



Signal Descriptions 



5-123 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



the Halt state, the processor transitions to the Stop Grant 
state; it then returns to the Halt state when STPCLK is 
negated. No processor registers are saved before entering the 
Halt state because the processor returns to the next unexe- 
cuted instruction in program order when it returns to its prior 
operating mode. Within the Halt state, the processor disables 
the majority of its internal clock distribution and (if STPCLK 
is asserted) the internal pullup resistor on STPCLK. However, 
its phase-lock loop still runs, its key internal logic is still 
clocked, most of its inputs and outputs retain their last state 
(except D63-D0 and DP7-DP0, which are floated), and it still 
responds to input signals. 

The assertion of STPCLK causes the processor to enter the 
Stop Grant state. The processor can enter the Stop Grant state 
from the normal operating modes (Real, Protected or Virtual- 
8086), SMM, or the Halt state. When STPCLK is negated, the 
processor leaves the Stop Grant state and returns to the mode 
from which it entered. If the Stop Grant state was entered from 
the Halt state, negation of STPCLK returns the processor to 
the Halt state. Otherwise, negation of STPCLK or assertion of 
RESET returns the processor to a normal operating mode 
(Real, Protected or Virtual-8086) or SMM. If INIT is asserted in 
the Stop Grant state, the signal is latched and acted upon after 
STPCLK is negated. No processor registers are saved before 
entering the Stop Grant state because the processor returns to 
the next unexecuted instruction in program order when it 
returns to its prior operating mode. Within the Stop Grant 
state (as in the Halt state) the processor disables the majority 
of its internal clock distribution and (if STPCLK is asserted) 
the internal pullup resistor on STPCLK. However, its phase- 
lock loop still runs, its key internal logic is still clocked, most 
of its inputs and outputs retain their last state (except D63-D0 
and DP7-DP0, which are floated), and it still responds to input 
signals. 

An inquire cycle driven while the processor is in the Stop 
Grant state or the Halt state causes the processor to transition 
to the Stop Grant Inquire state. As for inquire cycles driven 
from any other state, system logic must assert AHOLD, BUFF, 
or HOLD to obtain the address bus before driving EADS, INV, 
and the inquire address. The processor responds normally by 
driving HITM and/or HIT and performing any necessary cache- 
state transition. If HITM is asserted, the processor drives a nor- 



5-124 



Bus Interface 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



mal writeback (immediately if AHOLD is asserted, or delayed 
if BUFF or HOLD are asserted) and returns to the state from 
which it entered the Stop Grant Inquire state in the clock in 
which it negates HUM. If HUM is not asserted, the processor 
returns two clocks after EADS. 

The processor enters the Stop Clock state when system logic 
turns off CLK while STPCLK is asserted. This is the minimum- 
power state and it can only be entered from the Stop Grant 
state after BRDY has been returned for the Stop Grant special 
bus cycle. In the Stop Clock state, the processor’s phase-lock 
loop and I/O buffers are disabled, except for the I/O buffers on 
CLK and the Test Access Port (TAP) signals. System logic 
should not change the state of any signals, and the processor 
does not recognize any signal edges in the Stop Clock state. 
When CLK is restarted, the processor returns to the Stop Grant 
state, responds to inputs in the next clock, but cannot drive bus 
cycles until its phase-lock loop is synchronized. The latter 
takes several clocks (see the data sheet for this specification). 
The CLK can be driven with a different frequency, and/or the 
bus-to-processor clock ratio can be changed on the BF input(s) 
upon restarting CLK. 

Thus, when CLK is restarted, the processor can: 

■ Respond to AHOLD, BUFF, or HOLD in the next clock after 
CLK restarts, and 

■ Transition to the Stop Grant Inquire state as early as two 
clocks after the assertion of AHOLD, two clocks after the 
assertion of BUFF, or one clock after the assertion of HLDA 
(if system logic drives an inquire cycle with EADS, INV and 
an inquire address) and 

■ Drive HUM and/or HIT two clocks after EADS. 

However, if the inquire cycle hits a modified line, the processor 
does not drive the writeback until several clocks after CLK 
restarts (see the data sheet). In this case, the only indication 
system logic receives of the writeback is the ADS that initiates 
it. 

Thus, the processor recognizes AHOLD, BUFF, and HOLD dur- 
ing the Stop Grant and Stop Grant Inquire states but not dur- 
ing the Stop Clock state. When asserted in the Stop Grant 
state, these signals cause the processor to restart its internal 
clock and transition to the Stop Grant Inquire state. When the 
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processor is in the Stop Clock state, however, CLK must be 
restarted before any other signals are changed. 

STPCLK is the lo west-priority interrupt, as shown in Table 5-3 
on page 5-16. R/5 is the only interrupt or exception that is 
acted upon while STPCLK is asserted, but R/S is only acted 
upon in the Stop Grant state, not the Stop Clock state. Edge- 
triggered interrupts (FLUSH, SMI, INIT, NMI) are not latched 
in the Stop Clock state; however, they are latched in the Stop 
Grant state and are recognized after STPCLK is negated. 

The AMD-K5 and Pentium processors differ in their support 
for STPCLK in the following ways: 

■ In the Halt state, the AMD-K5 processor responds to STP- 
CLK by entering the Stop Grant state. The Pentium proces- 
sor ignores STPCLK in the Halt state. 

■ The Pentium processor guarantees that at least one instruc- 
tion will be executed between the negation of STPCLK and 
a subsequent reassertion of STPCLK. The AMD-K5 proces- 
sor does not guarantee this. 

■ In the Halt or Stop Grant states, the AMD-K5 processor can- 
not enter a low-power state if it does not have the bus (that 
is, if AHOLD, BUFF or HLDA is asserted). The same may 
not be true of the Pentium processor. 

For further details on clock control and power management, 
see Section 6.4 on page 6-33 and Section 6.6 on page 6-40. 
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5.2.50 

Summary 

Sampled 

Details 



TCK (Test Clock) 

Input 

TCK is the clock for boundary-scan testing using the Test 
Access Port (TAP). 

The processor always samples TCK, except while RESET or 
INIT is asserted. The signal has an internal pullup resistor. 

Data and state definition are clocked into the processor on the 
rising edge of TCK. The outputs on TDO are driven valid on the 
falling edge of TCK. When TCK stops on its falling edge, the 
state of test latches in the processor are held. 

Section 7.8 on page 7-19 summarizes the implementation of 
TAP testing on the AMD-K5 processor. System logic should tie 
TCK High if TAP testing is not implemented. 

See the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1) specification for details on how the 
TAP signals and instructions are used for testing. The TAP is 
often called the Joint Test Action Group (JTAG) port, after the 
committee that proposed the IEEE TAP standard. 
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5.2.51 

Summary 

Sampled 

Details 



TDI (Test Data Input) 

Input 

TDI carries input test data and instructions for testing on the 
Test Access Port (TAP). 

The processor samples TDI every rising TCK edge, but only 
during the shift_IR and shift_DR states. TDI has an internal 
pullup resistor. 

TDI is always sampled, except while RESET or INIT is 
asserted. 

Instructions are shifted into the processor on TDI during the 
shift_IR TAP state. Data are shifted into the processor on TDI 
during the shift_DR TAP state. 

See the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1) specification for a description of 
how the TAP signals and instructions are used for testing. 
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5.2.52 TDO (Test Data Output) 

Output 

Summary TDO carries output data for testing on the Test Access Port 

(TAP). 

Driven and Floated The processor drives TDO every falling TCK edge, but only 

during the shift_IR and shift_DR states. It is floated at all other 
times. 

TDO is always driven, except when floated and while RESET 
or INIT is asserted. 

Details Instructions are shifted out of the processor on TDO during the 

shift_IR TAP state. Data are shifted out of the processor on 
TDO during the shift_DR TAP state. 

See the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1) specification for a description of 
how the TAP signals and instructions are used for testing. 
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5.2.53 

Summary 

Sampled 

Details 



TMS (Test Mode Select) 

Input 

TMS specifies the test function and sequence of test changes 
for testing on the Test Access Port (TAP). 

The processor samples TMS every rising TCK edge. TMS has 
an internal pullup resistor. 

TMS is always sampled, except while RESET or INIT is 
asserted. 

If TMS is asserted for five or more clocks, the TAP controller 
enters its test-reset-logic state, regardless of the controller 
state. This action is the same as that achieved by asserting 
TEST. 

See the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1) specification for a description of 
how the TAP signals and instructions are used for testing. 
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5 . 2.54 

Summary 

Sampled 

Details 



TRST (Test Reset) 

Input 

The assertion of TRST initializes the Test Access Port (TAP) by 
resetting its state machine. 

TRST is an asynchronous input. Unlike other asynchronous 
inputs, no synchronous setup and hold time are specified for 
TRST. TRST has an internal pullup resistor. 

TRST is always sampled, except while RESET or INIT is 
asserted. 

When TRST is asserted, the TAP controller enters its test- 
reset-logic state, regardless of the controller state. This action 
is the same as that achieved by holding TMS asserted for five 
or more clocks. The assertion of TRST is unnecessary at 
RESET because the processor performs the TAP reset auto- 
matically at that point. 

See the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1) specification for a description of 
how the TAP signals and instructions are used for testing. 



Signal Descriptions 



5-131 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



5.2.55 W/R (Write or Read) 

Output 

Summary The processor drives W/R to indicate whether it is performing 

a write or read cycle on the bus. The signal is driven at the 
same time as the other two cycle definition signals: D/C and 
M/IU. A specific encoding of D/C, M/IU and W/R identifies one 
of several special bus cycles. 

Driven and Floated W/R is driven and floated with the same timing as D/C. See the 

description of D/C on page 5-53. 

Details The processor drives W/R according to whether the access is 

initiated by the processor’s fetch logic (which can initiate only 
reads) or its load/store logic (which can initiate reads or writes 
of operands). Such accesses can be done speculatively. Before 
the processor fetches an instruction or reads or writes a data 
operand, it checks the associated code or data segment 
descriptor to verify that such action is permitted. The execute 
(E) bit in the segment descriptor maintained by the operating 
system distinguishes between data and code segments, and the 
(R/W) bit specifies the segment’s read and write properties. 
Code segments can only be read; data and stack segments can 
read-only or read-write. 

The processor specifies all special bus cycles with D/C = 0, 
M/IU = 0 and W/R = 1. The cycles are then differentiated by 
BE7-BE0 and A31-A3. 

At the falling edge of RESET, the states of BRDYC and BUS- 
CHK control the drive strength on the A21-A3 (not including 
A31-A22), ADS, HITM, and W/R signals. The drive strength is 
weak for all states of BRDYC and BUSCHK except when 
BRDYC and BUSCHK are both Low, in which case the drive 
strength is strong. The A31-A22 signals use the weak drive 
strength at all times. See the data sheet for details. 
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5.2.56 

Summary 

Sampled 

Details 



WB/WT (Writeback or Writethrough) 

Input 

WB/WT, together with PWT, specifies the data-cache ME SI 
state of cacheable read misses and write hits. 

The processor samples WB/WT in the same clock as the first 
BRDY of a bus cycle or NA, whichever comes first. 

WB/WT is sampled during memory reads and writes, including 
writebacks, in the normal operating modes (Real, Protected, 
and Virtual-8086) and SMM, and when PRDY is asserted. WB/ 
WT is not sampled during I/O cycles, locked cycles, special bus 
cycles, or interrupt acknowledge operations; or during the 
Shutdown, Halt, Stop Grant, or Stop Clock states; or while 
BUFF, HLDA, RESET, or INIT is asserted. While AHOLD is 
asserted, WB/WT is sampled only to complete a bus cycle 
begun before the assertion of AHOLD. 

Lines in the shared ME SI state are said to be in the 
writethrough state. Those in the exclusive or modified MESI 
state are said to be in the writeback state. When a write access 
either misses the data cache or hits a shared line in the data 
cache, the processor drives a l-to-8-byte write cycle (called a 
writethrough ) on the bus. When an inquire cycle, internal 
snoop, FLUSH operation, or WBINVD instruction hits a modi- 
fied line in the data cache, the processor drives a 32-byte burst 
write cycle (called a writeback) on the bus. Table 2-2 on page 2- 
19 shows the relationships between cache accesses, 
writethroughs, and writebacks. 

WB/WT and PWT determine the MESI state of a cache line 
after a read miss (and resulting cache-line fill) or a write hit. 
During read misses, these two signals are interpreted along 
with the states of the CACHE output and the KEN input. Dur- 
ing write hits, WB/WT and PWT alone determine the resulting 
MESI state of a cache line. Tables 5-17 and 5-18 shows the rela- 
tionship between WB/WT and PWT for reads (Table 5-17) and 
writes (Table 5-18). If WB/WT is Low or PWT is High during a 
read miss or write hit, the accessed line is cached in, transi- 
tions to, or remains in the shared state after the read or write. 
If PWT is Low and WB/WT is High, the accessed line is cached 
in, transitions to, or remains in the exclusive state after a read 
miss or the first write hit to that line. If the line transitions to 
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the exclusive state, a subsequent write hit to the same line tran- 
sitions the line to the modified state. During write hits, the 
states of PWT and WB/WT can only change a line from shared 
to exclusive; it cannot change an exclusive line to a shared line. 



Table 5-17. MESI-State Transitions for Reads 



Signal or Event 


Result of Cache Lookup 


Read Miss 


Read Hit 


shared 


exclusive 


modified 


CACHE, PCD 1 


1 


- 


0 


0 


0 


- 


- 


- 


KEN 


- 


1 


0 


0 


0 


- 


- 


- 


PWT 


- 


- 


1 


- 


0 


- 


- 


- 


WB/WT 


- 


- 


- 


0 


1 


- 


- 


- 


Cache-Line Fill 
(32 bytes) 


no 


no 


yes 


yes 


yes 


no 


no 


no 


State After Read 2 


- 


- 


shared 


shared 


exclusive 


shared 


exclusive 


modified 



Notes: 

- Don 't care or not applicable. 

1. The PCD bit is one determinant of the state of CACHt. 

2. Transition occurs after any line fill. Lines in shared MESI state are said to be in writethrough state. Those in exclusive or modified 
MESI states are said to be in writeback state. 



5-134 



Bus Interface 







1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



Table 5-18. MESI-State Transitions for Writes 



Signal or Event 


Result of Cache Lookup 


Write Miss 


Write Hit 


shared 


exclusive 
or modified 


CACHE, PCD 1 


- 


- 


- 


- 


- 


KEN 


- 


- 


- 


- 


- 


PWT 2 


- 


1 


- 


0 


- 


WB/WT 


- 


- 


0 


1 


- 


Cache Update 


no 


yes 


yes 


yes 


yes 


Write to Memory 


writethrough 
(1 to 8 bytes) 


writethrough 
(1 to 8 bytes) 


writethrough 
(1 to 8 bytes) 


writethrough 
(1 to 8 bytes) 


no 


State After Write 3 


- 


shared 


shared 


exclusive 


modified 



Notes: 

- Don 't care or not applicable. 

1. The PCD bit is negated and CACHE is asserted during a write hit, but these states do not affect the hit. 

2. The PWT bit in the page table entry or CR3. 

3. Transition occurs after any write to memory. Lines in shared MESI state are said to be in writethrough state. Those in exclusive or 
modified MESI states are said to be in writeback state. 



In single-processor systems with no other caching master, WB/ 
WT is typically tied High. This allows the processor to cache all 
cacheable reads in the exclusive state, and all cacheable writes 
update only the cache. In systems with multiple caching mas- 
ters, WB/WT can be generated after inquire cycles to all other 
caching masters by the logical OR of HIT from all of the mas- 
ters. This allows the processor to cache reads in the exclusive or 
modified state only if no other master has a copy. 

While the writeback configuration usually supports higher per- 
formance, the writethrough configuration is required for cer- 
tain transitions in the write-once cache protocol. For details on 
this protocol, see Section 6.2.6 on page 6-19. 

During the Hardware Debug Tool (HDT) mode, WB/WT is only 
meaningful for cache write misses (PWT = 0 and WB/WT = 1 
transition a shared line to an exclusive line). The signal is not 
meaningful during cache read misses in the HDT mode, 
because the caches are never filled in the HDT mode. 

For more details on data-cache MESI state transitions during 
reads, see Table 5-9 on page 5-51 and Section 6.2.2 on page 6-9. 
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5.3 Bus Cycle Overview 



The bus signals described in the previous section combine to 
form various types of bus transactions, or bus cycles. This sec- 
tion summarizes the general features of the bus cycles: cycle 
definition, addressing, alignment, and priorities. Section 5.4 
describes the signal timing for specific types of bus cycles. 

5.3.1 Cycle Definitions 

The processor begins driving a bus cycle when it asserts ADS. 
Concurrent with ADS, it drives the set of signals indicated in 
Table 5-19, which define the type of bus cycle. For memory 
reads, memory writes, burst reads, and burst writes, D/C speci- 
fies whether the bus cycle accesses code (instructions) or data. 
M/IU specifies whether the cycle accesses memory or an I/O 
port. W/R specifies whether the cycle is a read or write. The 
assertion of CACHE indicates that the processor is writing or is 
prepared to read a burst cycle consisting of four consecutive 
transfers on the data bus. However, for a read, system logic 
must confirm the burst by asserting KEN, or the bus cycle 
becomes a single-transfer read. I/O accesses are always non- 
burst cycles. 



Table 5-19. Bus Cycle Definitions 



Type of Cycle 


Signals 


Comments 


D/C 


M/ra 


W/R 


CACHE 


Single-Transfer Memory Read or 
Write 


0 or 1 


i 


Oor 1 


1 


- 


Single-Transfer I/O Read or Write 


1 


0 


0 or 1 


1 


- 


Burst Memory Read or Write 


0 or 1 


i 


0 or 1 


0 


For reads, system logic must assert 
KEN with BRDY. 


Interrupt Acknowledge 


0 


0 


0 


- 


Pair of locked cycles. 


Special 


0 


0 


1 


- 


Several special cycles distinguished by 
BE7-BE0 and A31-A3. See Table 5-23 
on page 5-180. 



Interrupt acknowledge operations consist of a locked pair of 
read cycles. Special bus cycles are further differentiated by 
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the signals shown in Table 5-23 on page 5-180. In addition to 
the processor-driven bus cycles shown in Table 5-19, system 
logic can drive inquire cycles to the processor. These bus 
cycles are described later, in Section 5.4.4 on page 5-156. 

The processor samples BRDY during all bus cycles that it 
drives. The number of BRDYs expected by the processor 
depends on the type of bus cycle, as follows: 

■ One BRDY for an aligned single-transfer read or write 
cycle, a special bus cycle, and each of two bus cycles in an 
interrupt acknowledge operation. One additional BRDY for 
each misaligned cycle. 

■ Four BRDYs for burst cycles (one BRDY for each of the four 
transfers). Burst cycles are always aligned. 

The last expected BRDY represents the completion of a proces- 
sor-initiated bus cycle. The processor guarantees at least one 
idle clock between consecutive bus cycles, whether unlocked 
or locked. This means that consecutive locked operations, 
which consist of consecutive bus cycles, also have at least one 
idle clock between them. 

5.3.2 Addressing 

The address for a bus cycle is driven on A31-A3 and BE7-BE0. 
A31-A3 carry the upper 29 bits of the address, identifying an 
aligned 8-byte (quadword) region in memory. BE7-BE0 iden- 
tify the accessed bytes in that quadword, in effect indicating 
the three least-significant bits of the address and the size (in 
bytes) of the desired transfer. For burst and inquire cycles, 
A31-A5 are sufficient to identify the memory location of the 
cache line. For burst reads, which are four-transfer cache-line 
fills, system logic should watch A4-A3 and return the 
addressed quadword first, before returning the remainder of 
the cache line. 

More details on burst-cycle addressing are given in Section 
5.4.3 on page 5-149. 

5.3.3 Alignment 

For purposes of bus cycles, the term aligned means: 
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■ 2- and 4-byte transfers lie within 4-byte address boundaries 

■ 8-byte transfers lie within 8-byte address boundaries 

(For purposes of exceptions, the term aligned means situated 
on the natural boundaries of an instruction or operand. Thus, a 
2-byte transfer that crosses a 2-byte address boundary may 
incur an alignment exception, but it will be performed as an 
aligned bus cycle.) 

If data on D63-D0 is misaligned, the processor generates addi- 
tional bus cycles to complete the transfer. For example, if a 4- 
byte transfer begins at address x07h, one byte will be trans- 
ferred during the first bus cycle and the remaining three bytes 
will be transferred during a second bus cycle, which will nor- 
mally occur immediately after the first bus cycle (unless an 
interrupt or bus backoff intervenes). If the misaligned transfer 
is run as a locked cycle, the processor asserts both LOCK and 
SCYC throughout the misaligned sequence of bus cycles. 

If memory reads, memory writes, or I/O reads are misaligned, 
the AMD-K5 processor runs the bus cycles in the opposite 
order of the Pentium processor. The AMD-K5 processor trans- 
fers the least-significant bytes first followed by the most-signif- 
icant bytes. I/O writes, however, are performed in the same 
order on both processors: the most-significant bytes first, fol- 
lowed by the least-significant bytes. 

For a misaligned CMPXCHG8B operation (that is, the operand 
does not lie on an 8-byte quadword boundary), the AMD-K5 
processor does two split-cycle reads followed by two split-cycle 
writes, all with LOCK asserted, for a total of eight bus cycles. 
The Pentium processor combines the cycles for a maximum of 
four bus cycles. 
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5.3.4 Bus Speed and Typical DRAM Timing 

The processor can be configured for external bus (CLK) speeds 
of 50, 60, or 66 MHz. Main DRAM memory can be built from 
Page-mode or EDO (extended data out) DRAM, although faster 
memory devices can be used for higher performance. 

On a 66-MHz bus, the read cycle time for a DRAM-page hit in 
EDO DRAM is 7- 2-2-2 (7 clocks for the first transfer and 2 
clocks for each remaining transfer) and 10-2-2-2 for a DRAM- 
page miss. The read cycle time for a DRAM-page hit in Page- 
mode DRAM at 66 MHz is 7-4-4-4 and 10-4-4-4 for a DRAM-page 
miss. On a 50-MHz bus, there is no change in timing for EDO 
DRAM, but Page-mode DRAM timing becomes 6-3-3-3 for a 
DRAM-page hit and 8-3-3-3 for a DRAM-page miss. 

5.3.5 Bus-Cycle Priorities 

The AMD-K5 processor can support only one on-going bus 
cycle at a time — pending bus cycles are not buffered. System 
logic maintains the ultimate control over the bus. The proces- 
sor asserts BREQ to request control of the bus. System logic 
asserts AHOLD, BOLE, or HOLD to take control of the bus. 
AHOLD passes control of the address bus to system logic for 
use in inquire cycles, but permits completion of in-progress 
cycles on the data bus. BOLE forces an in-progress bus cycle to 
abort and passes control to system logic. HOLD allows an in- 
progress bus cycle to complete before passing control to sys- 
tem logic. 
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5.4 Bus Cycle Timing 



The following sections describe and illustrate the timing and 
relationship of bus signals during various types of bus cycles. 
Only a representative set of bus cycles are illustrated. Many 
more combinations are possible. 

5.4.1 Timing Diagrams 

The timing diagrams show the signals on the external bus as a 
function of time, as measured by the bus clock (CLK). Through- 
out this chapter, the term clock refers to bus-clock cycles, not 
processor-clock cycles, and the term cycle refers to bus cycles 
not clocks. A clock extends from one rising CLK edge to the 
next rising CLK edge. The processor samples and drives most 
signals relative to the rising edge of CLK. The exceptions to 
this rule include: 

■ FLUSH and SMI — Sampled on the falling edge of CLK 

■ BF (BF1-BF0), FLUSH, FRCMC, and INIT — Sampled on the 
falling edge of RESET 

■ TDI, TDO, TMS and TRET — Sampled relative TCK 

For each signal in the timing diagrams, the High level repre- 
sents 1, the Low level represents 0, and the middle level repre- 
sents the floating (high-impedance) state. When both the High 
and Low levels are shown, the meaning depends on the signal. 
For a single signal, it means don’t care. For a bus, it means that 
the processor or system logic is driving a value, but this value 
may or may not be valid (for example, the value on the address 
bus is valid only during the assertion of ADS, although 
addresses are also driven on the bus at other times). 

The value indicated for the address bus represents the value 
driven on lines A31-A3. This value, multiplied by 8, is the byte 
address of an 8-byte region in memory. The value for BE7-BE0 
indicates which bytes in that region are to be transferred: the 
bytes corresponding to the zeros on BE7-BE0 are transferred. 

The timing diagrams given in the following sections assume 
that the current privilege level (CPL) is always 0. 
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5.4.2 Single-Transfer Reads and Writes 

The single-transfer memory and I/O bus cycles transfer 1, 2, 4, 
or 8 bytes. Misaligned instructions or operands result in a split 
cycle, which requires multiple transactions on the bus. During 
single-transfer (non-cacheable) code fetches, the AMD-K5 and 
Pentium processors read 8 bytes, not 16 bytes as the 486 pro- 
cessor does. 

Single-Transfer Figure 5-2 shows a single-transfer doubleword code fetch 

Memory Read and (read) from memory, followed immediately by a single-transfer 

Write doubleword write to memory. For the memory-read cycle, the 

processor drives A31-A3, BE7-BE0 (with AP for parity check), 
D/C, W/R, and M/IU. Then, somewhat later, it asserts ADS and 
BREQ. ADS, which is held asserted for only one clock, vali- 
dates the bus cycle. The processor then waits for system logic 
to return the data on D63-D0 (with DP7-DP0 for parity check) 
and assert BRDY. System logic can return BRDY as early as 
one clock after ADS, thus supporting very fast memory 
devices. 

During the read cycle, the processor drives PCD, PWT, and 
CACHE to indicate its caching and cache-coherency intent for 
the access. System logic returns KEN and WB/WT to either con- 
firm or change this intent. In this example, the processor 
asserts PCD and negates CACHE, so the accesses are non- 
cacheable, even though system logic asserts KEN during the 
BRDYs to indicate its support for cacheability. The processor 
(which drives CACHE) and system logic (which drives KEN) 
must agree in order for an access to be cacheable. They must 
also agree among PWT and WB/WT in order for a cacheable 
line to be cached in the writeback state. 

The processor can drive another cycle (in this example, a write 
cycle) as early as two clocks after the assertion of BRDY. A 
dead (or idle) clock is thus guaranteed between any two bus 
cycles. As in the read cycle, neither the address nor the cycle- 
definition signals are valid until the processor asserts ADS, 
and the value driven on A31-A3 is valid only during the asser- 
tion of ADS. 

This example shows a parity error during the read cycle, as 
indicated by the processor’s assertion of PCHK two clocks 
after BRDY. Because system logic asserts PEN during the 
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BRDY, the processor latches the physical address and cycle 
definition of the failed bus cycle in its 64-bit machine-check 
address register (MCAR) and its 64-bit machine-check type 
register (MCTR). For details on such parity errors, see the 
descriptions of PCHK and PEN on pages 5-101 and 5-102. 

While Figure 5-2 shows BRDY returned in the next clock after 
ADS, most DRAM-based systems add wait states (idle clocks) 
between ADS and BRDY. 
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Single-Transfer 
Memory Write 
Delayed by EWBE 
Signal 



Figure 5-3 shows two consecutive memory writes. The first 
write fills an external write buffer and the second write is 
stalled for three clocks by the negation of EWBE. 

For writes, system logic can store the address and data in a 
write buffer, return BRDY, and perform the store to memory 
later. If the number of outstanding writes exceeds the size of 
the write buffer, system logic must negate EWBE to prevent 
the processor from sending additional writes until EWBE is 
asserted. The advantage of negating EWBE as opposed to not 
asserting BRDY is that negating EWBE prevents only write 
requests, but not asserting BRDY stalls the bus and prevents 
all requests. 

More specifically, if EWBE is negated with or after the last 
BRDY of a write cycle, the processor will not do any of the fol- 
lowing: 

■ Write a store-buffer entry to the data cache 

■ Write to memory (single-transfer or burst), including locked 
write to Accessed (A) bit after TLB load 

■ Write to I/O (OUTx) 

■ Execute the following instructions: 

MOV to CRO 

MOV to CR4, including during a task switch 

WBINVD 

INVLPG 

CPUID 

■ Respond to the following instructions: 

FLUSH 

SMI 

■ Respond to any other interrupts or exceptions that cause a 
write to memory, such as pushing state onto the stack or set- 
ting the Accessed bit in a segment descriptor. This may 
include the BUSCHK, NMI, and INTR interrupts. 

For more details, see the description of EWBE on page 5-62. 
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Figure 5-3. Single-Transfer Memory Write Delayed by EWBE Signal 
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I/O Read and Write Figure 5-4 shows an I/O read followed by an I/O write. The pro- 
cessor accesses I/O when it executes an I/O instruction (any of 
the INx or OUTx instructions). Accesses to memory-mapped 
I/O ports appear on the bus as accesses to memory rather than 
to the I/O address space. 

The I/O-cycle protocol is nearly the same as the protocol for 
read and write accesses to memory, shown in Figure 5-2, except 
that M/IU = 0. Only data (not code) can be read or written from 
the I/O address space. The cycle definition for an I/O code read 
(D/C = 0, M/IU = 0, W/K = 0) defines an interrupt acknowledge 
cycle, and the cycle-definition for an I/O code write (D/C = 0, 
M/IU = 0, W/R = 1) defines a special bus cycle. 

The example in Figure 5-4 shows a single wait state separating 
ADS and BRDY for the read. In actual systems, however, the 
time will typically be longer. 




Figure 5-4. I/O Read and Write 
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Single-Transfer 
Misaligned Memory 
and I/O Transfers 



Figure 5-5 shows a misaligned (split) memory read followed by 
a misaligned I/O write. (For a definition of misaligned, see Sec- 
tion 5.3.3 on page 5-137.) When the processor encounters a mis- 
aligned access, it determines the appropriate pair of bus 
cycles — each with its own ADS and BRDY — required to com- 
plete the access. 

In this example, the first pair of bus cycles represents a mem- 
ory read of the doubleword at 800Eh. This access crosses a dou- 
bleword boundary, so it is misaligned. The processor first reads 
the word at 800Eh, followed by the word at 8010h. The second 
pair of bus cycles represents a write of a doubleword to I/O 
address 8Eh. This transfer also crosses a doubleword bound- 
ary, so it is misaligned. The processor writes the word to I/O 
address 90h, followed by the word to I/O address 8Eh. 

The AMD-K5 processor performs misaligned memory read, 
memory write, and I/O read transfers in the reverse order of 
the Pentium processor, but misaligned I/O write transfers are 
performed in the same order on both processors. Table 5-20 
shows the order. Thus, in this example, the I/O write accesses 
the most-significant bytes first followed by the least-significant 
bytes, the opposite order from the memory accesses and I/O 
reads. 



Table 5-20. Bus-Cycle Order During Misaligned Transfers 



Type of Access 


First K5 
Cycle 


Second K5 Cycle 


Pentium 

Compatible? 


Memory Read 


LSBs 


MSBs 


no 


Memory Write 


LSBs 


MSBs 


no 


I/O Read 


LSBs 


MSBs 


no 


I/O Write 


MSBs 


LSBs 


yes 



The SCYC (Split Cycle) output has no meaning in unlocked 
misaligned transfers. It is only meaningful in locked mis- 
aligned transfers. 
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5.4.3 Burst Cycles 

The processor drives burst cycles, which consist of four sequen- 
tial eight-byte (quadword) transfers on the data bus, only in 
the following cases: 

■ Burst Read — Cache-line fills from memory. These burst 
reads occur when the processor asserts CACHE during ADS 
and system logic asserts KEN during the first BRDY of a 
read cycle. 

■ Burst Write — Writebacks to memory of modified cache lines. 
Writebacks can be caused by (a) externally initiated 
inquire cycles or FLUSH operations, (b) processor-initiated 
internal snoops or cache-line replacements, or (c) program- 
initiated WBINVD instructions. 

Writethroughs to memory, which occur in response to write 
misses or write hits to shared cache lines, are driven as single- 
transfer bus cycles. 

Burst Read Figure 5-6 shows two consecutive burst reads. During burst 

reads (CACHE and KEN both asserted with the first BRDY of a 
memory read), the processor drives BE7-BE0 with ADS to 
identify the bytes of the desired instruction or operand. The 
processor drives BE7-BE0 with the desired bytes at that time 
because it does not yet know whether the read will be a single- 
transfer or a burst — this depends on how system logic drives 
KEN with the first BRDY. If system logic negates KEN it must 
return, as a single transfer, only the bytes specified on BE 7- 
BEO. If system logic asserts KEN, it must ignore BE7-BE0 dur- 
ing all transfers of the burst and return all eight bytes for the 
starting address on A31-A3. BE7-BE0 does not change during 
the four transfers of the burst. (This behavior is unlike the 486 
processor, which drives BE3-BE0 separately for each transfer 
of a burst.) System logic must determine the successive quad- 
word addresses for each transfer in a burst, depending on the 
starting address, as shown in Table 5-21. 
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Table 5-21. Address-Generation Sequence During Bursts 



Address Driven By 
Processor on A3 1 -A3 


Addresses of Subsequent Quadwords 1 
Generated By System Logic 


Quadword 1 


Quadword 2 


Quadword 3 


Quadword 4 


...OOh 


...08h 


...lOh 


...1 8h 


...08h 


...OOh 


...1 8h 


...lOh 


...lOh 


... 1 8h 


...OOh 


...08h 


... 1 8h 


...lOh 


...08h 


...OOh 


Notes: 

1. quadword = 8 bytes 



In the clock after ADS, the processor drives the first of four 
sequential eight-byte (quadword) transfers on the data bus. 
The processor holds the first transfer on the bus until system 
logic returns BRDY, then it transfers the next quadword. In 
this example, system logic returns BRDY with no wait states, 
and the processor responds by driving the subsequent quad- 
word in the next clock. Typical systems, however, add one or 
more wait states between the transfers. 

For both read cycles, the processor asserts CACHE with ADS 
and system logic asserts KEN with the BRDY of the first trans- 
fer. Thus, CACHE and KEN agree, and the access is cached. 
This agreement between CACHE and KEN is required in order 
for a burst read to occur. The processor only drives burst reads 
if the access is cacheable. If either CACHE or KEN were 
negated during the BRDY of the first transfer, the read would 
terminate with the first quadword transfer, thus becoming a 
single-transfer read. 

In this example, the processor negates PWT (indicating write- 
back state) and system logic drives WB/WT High with the 
BRDY of the first transfer (also indicating writeback state). 
Thus, PWT and WB/WT agree, and the cache line becomes a 
writeback line, which is cached in the exclusive MESI state. 
Details on the writeback/writethrough and MESI cache-coher- 
ency state transitions are given in Table 2-2 on page 2-19. 

In Figure 5-7, the two consecutive burst reads are identical to 
those in Figure 5-6, except that system logic asserts NA one 
clock before it asserts BRDY in the first read cycle of Figure 
5-7. This causes KEN and WB/WT to be effective when NA 
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(rather than BRDY) is asserted. KEN and WB/WT are validated 
by either NA or BRDY, whichever comes first. NA will not gen- 
erate a pipelined cycle in the event that there are no pending 
internal cycles. 




i Read i Read 



Figure 5-6. Burst Reads 
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Figure 5-7. Burst Read (NA Sampled) 
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Burst Writeback 



Bus Cycle Timing 



Figure 5-8 shows a burst read followed by a writeback. Write- 
backs are the only type of burst write that the processor per- 
forms. They can be initiated by the processor or by system 
logic in the following cases: 

■ Processor-Initiated Writebacks: 

• Replacement — If a cache-line fill is initiated when all 
four ways of the cache that could accommodate the in- 
coming line are filled with valid entries, the processor 
uses a round-robin algorithm to select a line for replace- 
ment. Before a replacement is made to a data cache line 
in the modified state, the line is written back to memory. 

• Internal Snoop — The processor snoops the data cache 
whenever an instruction-cache line is read, and it snoops 
the instruction cache whenever a data cache line is writ- 
ten. This snooping is performed to determine whether 
the same address is stored in both caches, a situation 
that is taken to imply the occurrence of self-modifying 
code. If a snoop hits a data cache line in the modified 
state, the line is written back to memory before being in- 
validated. 

• WBINVD Instruction — When the processor executes a 
WBINVD instruction, it writes back all modified lines in 
the data cache and then invalidates all lines in both 
caches. The action taken in response to the WBINVD in- 
struction is essentially the same as the action taken in 
response to the FLUSH input signal, except that the ac- 
knowledge cycles differ. For details, see page 5-185. 

■ System-Initiated Writebacks: 

• Inquire Cycle Hits — If an inquire cycle hits a modified 
line in the data cache, the processor writes back the line. 
For details, see page 5-157. 

• FLUSH — If system logic asserts the FLUSH input, the 
entire contents of the data cache are written back to 
memory before the entire contents of both caches are in- 
validated. The action taken in response to the FLUSH 
input signal is essentially the same as the action taken in 
response to the WBINVD instruction, except that the ac- 
knowledge cycles differ. For details, see page 5-183. 

During all processor-initiated and system-initiated FLUSH 
writebacks, the processor asserts ADS, drives a 32-byte-aligned 
starting address on A31-A3, and enables all eight bytes 
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(BE7-BE0 = OOh). Thus, A4-A3 are always 0 for writebacks. 
During inquire cycle writebacks, the processor does the same 
thing, except that if system logic holds AHOLD asserted 
throughout the writeback, the processor lets system logic pro- 
vide the address. 

The writeback shown in Figure 5-8 is caused by a cache-line 
replacement, which occurs when an attempted burst read finds 
that all four cache ways for that address are filled with valid 
entries. In this case, the processor performs the following 
sequence: 

1. Copies the prior contents of the replacement line to its 32- 
byte writeback buffer (described in Section 2.3.7 on page 2- 
23). This is not visible on the bus. 

2. Completes the burst read, placing the incoming data into 
the cache line. This is the first burst cycle in Figure 5-8. 

3. Writes the modified line back to memory. This is the second 
burst cycle in Figure 5-8. 

During the burst read (Step 2), the states of PWT and WB/WT 
are the same as in Figure 5-6 and Figure 5-7. 
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Figure 5-8. Burst Writeback Due To Cache-Line Replacement 
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5 . 4.4 



Bus Arbitration and Inquire Cycles 

The processor bus may be required by another bus master, 
which may need to drive its own cycles on the bus, or by system 
logic, which may need to drive an inquire cycle to the proces- 
sor or resolve bus deadlock. One of three signals can be used 
for these purposes: AHOLD, BUFF, or HOLD. AHOLD’s sole 
function is to support inquire cycles. It obtains control only of 
the address bus and allows another master or system logic to 
drive only inquire cycles, whereas BUFF and HULD obtain 
control of the full bus (address and data), allowing another 
master to drive not only inquire cycles but also read and write 
cycles. BUFF provides the fastest access to the bus and it 
aborts any in-progress cycle by the processor. AHULD and 
HULD both permit an in-progress bus cycle to complete, but a 
writeback can occur while AHULD is asserted whereas a pend- 
ing writeback during the assertion of BUFF or HULD occurs 
after the BUFF or HULD is negated. 

In most systems, the choices are between BUFF and AHULD. 
Due to its slow response time, HULD is usually considered only 
when backward-compatibility with prior-generation sub- 
systems requires it or when the integrity of in-progress bus 
cycles is of paramount importance. Support for BUFF is usu- 
ally needed to resolve potential deadlock problems that arise 
as a result of inquire cycles, and if BUFF is supported, there is 
usually no reason to support HULD. The sections that follow 
further describe these relative advantages and disadvantages. 

In systems with multiple caching masters and shared memory, 
system logic can maintain cache coherency by driving inquire 
cycles to the processor whenever another bus master accesses 
shared memory. Such system-initiated bus cycles cause the 
processor to compare the physical tags for both its instruction 
and data caches with the inquire address, in parallel with any 
cache accesses the processor makes via its linear tags. If a 
match is found, the processor writes the cache line back to 
memory, if modified, and changes the ME SI state according to 
the state of the INV input signal during the inquire cycle. 

The system logic’s sequence for driving inquire cycles is: 

1. Assert AHULD to obtain control of the address bus, or 
assert either BUFF or HULD to obtain control of the entire 
bus. 
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AHOLD-lnitiated 
Inquire Miss 



2. Two clocks after the assertion of BUFF or AHOLD, or one 
clock after sampling HLDA asserted when HOLD is used, 
assert EADS while driving a cache-line address on A31-A5, 
and assert or negate INV. The processor latches the address 
when it samples EADS asserted. 

3. Wait two clocks, watching for HITM and/or HIT to be 
asserted: 

If neither HIT nor HITM are asserted at the end of two 
clocks, or if only HIT is asserted, the inquire cycle termi- 
nates. 

If HITM is asserted, a writeback follows and the processor 
does not recognize EADS again until the last BKDY of the 
writeback. The timing of the writeback depends on whether 
AHOLD, BOFF, or HOLD was asserted to gain access to the 
bus. If AHOLD was used, the processor begins driving the 
four-transfer burst writeback as early as two clocks after 
asserting HITM, whether or not AHOLD is still asserted. If 
BOFF or HOLD was used, the processor delays the write- 
back until just after BOFF or HLDA is negated. 

The resulting state of a cache line that is hit by an inquire 
cycle depends on the state of the INV signal at the time of the 
inquire cycle (see Table 5-11 on page 5-71). If INV is negated, 
the line remains in or transitions to the shared state. If INV is 
asserted, the line is written back, if modified, and transitions to 
the invalid state. 

Figure 5-9 shows a burst read, during which system logic 
asserts AHOLD to acquire the address bus for an inquire cycle. 
The processor floats the address bus one clock after AHOLD is 
asserted, although the data bus continues to return data from 
the in-progress burst read. (The processor supports only one in- 
progress bus cycle. No pending bus cycles are buffered.) Two 
clocks after asserting AHOLD, system logic initiates the 
inquire cycle by asserting FADS, driving INV (negated in this 
example), and driving the inquire address on A31-A5. 

Although the inquire cycle misses the cache (HIT is negated 
two clocks after EADS), the processor’s assertion of APCHK 
two clocks after EADS indicates that a parity error occurred on 
the inquire cycle address. Because of this parity error, system 
logic should disregard the result of the inquire cycle and per- 
form it again. 
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For an AHOLD inquire cycle to be recognized, AHOLD must 
have been asserted continuously for two clocks at the time 
EADS is asserted. AHOLD and BUFF can be asserted in con- 
junction with each other without interfering with EADS recog- 
nition, as long as the sampling criteria for at least one of the 
signals (AHOLD or BUFF) is met. 




Figure 5-9. AHOLD-lnitiated Inquire Miss 
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AHOLD-lnitiated Figure 5-10 shows an example similar to Figure 5-9, minus the 

Inquire Hit to Shared address parity error, but this inquire cycle hits either a shared 
or Exclusive Line or exclusive line in the cache, as indicated by the assertion of 

HIT and the negation of HUM two clocks after the assertion of 
EADS. The processor invalidates the cache line because sys- 
tem logic asserts INV with EADS. The processor may drive a 
new bus cycle as early as one clock after system logic negates 
AHOLD. 




Figure 5-10. AHOLD-lnitiated Inquire Hit to Shared or Exclusive Line 
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AHOLD-lnitiated 
Inquire Hit to 
Modified Line 



Figure 5-11 shows the same sequence as in Figure 5-10, but this 
time the inquire cycle hits a modified line. As in Figure 5-10, 
system logic asserts INV with EADS. Two clocks later, the pro- 
cessor asserts both HIT and H1TM. A few clocks later the pro- 
cessor drives a writeback for the cache line and then 
invalidates its cached copy. The processor holds HITM 
asserted until one clock after the last BKDY of the writeback. 

If system logic holds AHOLD asserted throughout an inquire 
cycle and any required writeback, system logic must latch the 
inquire cycle address when it asserts EADS. This is required so 
that, if the inquire cycle hits a modified line, the address used 
for the writeback need not be driven by the processor when 
the processor asserts ADS for the writeback. Instead, A31-A5 
remains an input-only bus and system logic must use its 
latched copy of the inquire cycle address. By contrast, if sys- 
tem logic always negates AHOLD before the writeback, the 
processor drives the writeback address when it asserts ADS for 
the writeback, and system logic need not retain a copy of the 
inquire cycle address. While the processor drives the write- 
back address, it drives only the beginning address for the 32- 
byte transfer on A31-A5. System logic must determine the 
remaining addresses as shown in Table 5-21 on page 5-150. 
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Figure 5-11. AHOLD-lnitiated Inquire Hit to Modified Line 
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Bus Backoff (BUFF) 



BUFF provides the fastest response of the three bus-hold 
inputs. Unlike AHOLD and HOLD, BOFF does not permit an 
in-progress bus cycle to complete. It forces the processor off 
the bus in the next clock, aborting any in-progress bus cycle 
that the processor may have begun. 

Figure 5-12 shows a burst read interrupted by BOFF. One clock 
after sampling BOFF asserted, the processor aborts the entire 
in-progress burst read and floats its bus. All output and bidi- 
rectional signals used for memory or I/O accesses are floated. 
The processor ignores all data and BKDYs returned by the sys- 
tem during the aborted cycle. This is unlike BOFF on the 486 
processor, which retains the data that had been transferred up 
to the clock in which BOFF was asserted. BOFF has no effect 
on writes to the processor’s store buffer, except to delay them. 
(The store buffer is situated between the execution units and 
the data cache and is used for speculative stores, prior to being 
written in non-speculative state to the data cache.) 

Another bus master can begin driving cycles as early as two 
clocks after BOFF is asserted. System logic or another bus mas- 
ter may continue asserting BOFF for as long as it wants. The 
processor has no way of breaking the hold. While the processor 
is backed off, it continues to execute out of its instruction and 
data caches, if possible. If it can no longer operate out of its 
caches, it holds BREQ asserted continuously. 

As early as one clock after BOFF is negated, the processor 
restarts — from the beginning — any bus cycle that was aborted 
when BOFF was asserted. This is unlike BOFF on the 486 pro- 
cessor, which restarts only the transfers that did not complete 
when BOFF was asserted. The processor can drive another 
cycle with ADS as early as two clocks after any aborted cycle 
completes. This allows one idle clock (also called a dead clock) 
between any two bus cycles. If BOFF was asserted when ADS 
was also asserted, however, ADS remains Low (floats asserted) 
after BOFF is negated. In such a case, system logic must prop- 
erly interpret the state of ADS when it negates BOFF. 

Because of its ability to help resolve deadlock problems, BOFF 
is required in virtually all systems with multiple caching mas- 
ters. In such designs, system logic typically drives separate 
BOFF signals to each bus master in the system. See Section 
6.2.5 on page 6-14 for system configurations using BOFF. 
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Figure 5-12. Basic BOFF Operation 



Bus Cycle Timing 
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BOFF-lnitiated 
Inquire Hit to 
Modified Line 



Figure 5-13 shows a burst read interrupted by the assertion of 
BUFF for the purpose of an inquire cycle. One clock after sam- 
pling BUFF asserted, the processor aborts the burst read and 
floats its bus. Two clocks after asserting BUFF, system logic 
initiates the inquire cycle by asserting EADS and INV, and 
driving the inquire address on A31-A5. The processor asserts 
both HIT and HUM two clocks after FADS, thus indicating 
that the inquire hit a modified cache line. The writeback can- 
not occur while BUFF is asserted, however, because the proces- 
sor has floated its data and control outputs. 

After BUFF is negated, the processor writes back the modified 
cache line, holding HITM asserted until one clock after the last 
BKDY of the writeback. Because INV was asserted with FADS, 
the cache line is invalidated after its writeback. Then, the pro- 
cessor restarts — from the beginning — the aborted burst read. 

For a BUFF inquire cycle to be recognized, BUFF must have 
been asserted continuously for two clocks at the time EADS is 
asserted. AHULD and BUFF can be asserted in conjunction 
with each other without interfering with FADS recognition, as 
long as the sampling criteria for at least one of the signals 
(AHULD or BUFF) is met. 
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Figure 5-13. BOFF-lnitiated Inquire Hit to Modified Line 



Bus Cycle Timing 



5-165 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



HOLD-lnitiated 
Inquire Hit to Shared 
or Exclusive Line 



Figure 5-14 shows HOLD asserted in the same clock that the 
processor begins a read cycle. The processor completes the 
read (which is a burst read) and asserts HLDA two clocks after 
the last BKDY of the in-progress cycle. It also floats all output 
and bidirectional signals used for memory or I/O accesses at 
the same time it asserts HLDA. 

In the next clock after sampling HLDA asserted, system logic 
initiates an inquire cycle by asserting EADS and INV and driv- 
ing an inquire address on A31-A5. The inquire cycle hits a 
shared or exclusive line (HIT asserted and HUM negated two 
clocks after EADS) and the processor invalidates the cache 
line (not visible on the bus). System logic negates HOLD in the 
clock after EADS, and two clocks later (one clock after HIT 
and HITM transition) the processor negates HLDA and contin- 
ues with its other bus cycles. 

If EADS is asserted in the same clock that HOLD is negated, 
the processor recognizes this as a valid inquire cycle and han- 
dles it correctly. However, if EADS is asserted in the clock fol- 
lowing the negation of HOLD, the processor does not recognize 
this as a valid inquire cycle. 
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Figure 5-14. HOLD-lnitiated Inquire Hit to Shared or Exclusive Line 
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HOLD-lnitiated Figure 5-15 shows an example similar to the one in Figure 5-14, 

Inquire Hit to except that the inquire cycle hits a modified line (both HIT and 

Modified Line HITM asserted two clocks after EADS). System logic negates 

HOLD in the clock after EADS, and two clocks later (one clock 
after HIT and HITM transition) the processor negates HLDA. 
As early as one clock after negating HLDA, the processor 
asserts ADS to drive the writeback, after which the processor 
invalidates its copy of the line. 




[ Read [ Inquire [ Writeback 



Figure 5-15. HOLD-lnitiated Inquire Hit to Modified Line 
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5.4.5 Locked Cycles 

The processor asserts LOCK across certain sequences of mem- 
ory bus cycles that require integrity. These include interrupt 
acknowledge operations, descriptor-table updates, page-direc- 
tory and page-table updates, and exchange operations. In addi- 
tion, the processor asserts LOCK during bus cycles initiated by 
any instruction that has the LOCK prefix. The processor locks 
only memory cycles, not I/O cycles. 

LOCK is an indication to system logic that it should maintain 
the integrity of the locked bus cycles, either by never interven- 
ing in them or by some other system-level memory protection 
mechanism that guarantees integrity. 

Locked operations generated by the processor typically consist 
of a read-write pair of bus cycles with an operand modification 
between the two bus cycles (sometimes called read-modify- 
write), except that interrupt acknowledge operations, which 
are also locked, consist of a pair of read cycles with no operand 
modification between the cycles. Locked operations generated 
by the LOCK instruction prefix cause LOCK to be asserted 
only during bus cycles initiated by that single instruction. The 
processor guarantees at least one idle (or dead) clock between 
consecutive bus cycles, whether unlocked or locked. This 
means that consecutive locked operations, which consist of 
consecutive bus cycles, also have at least one idle clock 
between them. 

Figure 5-16 shows a pair of read-write bus cycles. The proces- 
sor asserts LOCK with the ADS of the first bus cycle in the 
locked operation, and holds it asserted until the last expected 
BRDY of the last bus cycle in the locked operation. Between 
the locked operations, the processor negates LOCK for at least 
one clock. 

This example also shows that the value driven on A31-A3 is 
valid only during the assertion of ADS. In the clock immedi- 
ately preceding the ADS for the write in the first locked opera- 
tion, the processor changes the address. If system logic reads 
the address in the clock before ADS, an unexpected value may 
be returned. 



Basic Locked 
Operation 
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Figure 5-16. Basic Locked Operation 
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TLB Miss 
(4-Kbyte Page) 



Figure 5-17 shows a TLB miss for a 4-Kbyte page. An overview 
of the 4-Kbyte paging mechanism is illustrated in Figure 3-2 on 
page 3-5. The paging mechanism for 4-Mbyte pages (Figure 3-3 
on page 3-6) is similar but somewhat simpler. The processor 
has separate TLBs for the two page sizes. 

If an address for an access cannot be found in the processor’s 
linearly addressed instruction or data cache, the TLB (which 
helps translate linear addresses to physical addresses) is 
searched for the entry associated with the accessed page. A 
TLB miss occurs if the entry cannot be found. For accesses to a 
4-Kbyte page that miss the TLB, the processor accesses first 
the page-directory entry (PDE) in memory and then the page- 
table entry (PTE) in memory to check, and if necessary set, 
their Accessed (A) bits. During a write access (not shown in 
this example), the processor also checks and, if necessary, sets 
the PTE Dirty (D) bit. 

The general sequence, both for PDE and PTE, is as follows for 
accesses to a 4-Kbyte page: 

■ The processor drives an unlocked read of the PDE or PTE to 
see if the relevant bit (A or D) is set. 

■ If the bit is cleared (0), the processor then drives a locked 
read-modify-write (four-byte read followed by four-byte 
write) to set the bit. 

The example in Figure 5-17 shows the following specific 
sequence: 

■ Read The PDE — The A bit in the PDE is set, so nothing fur- 
ther is done with the PDE. 

■ Read The PTE — The A bit in the PTE is cleared, indicating 
that the page has not been previously accessed since the 
operating system last cleared the bit 

■ Set The Accessed Bit — The processor performs a locked read- 
write pair of bus cycles to set the A bit. The diagram shows 
these cycles as a 4-byte PTE read followed by a 4-byte PTE 
write. It asserts LOCK with the ADS of the read cycle and 
holds it asserted until the BRDY of the write cycle. 

■ Read The Desired Location (Cache-Line Fill) — The processor 
reads the location that caused the TLB miss, filling a cache 
line as a result of the access. 
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Figure 5-17. TLB Miss (4-Kbyte Page) 
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Locked Operation 
with BUFF 
Intervention 



Unlike AHOLD and HOLD, BUFF does not permit an in- 
progress bus cycle to complete. It forces the processor off the 
bus in the next clock, aborting any in-progress bus cycle that 
the processor may have begun. If BUFF is asserted during a 
locked operation, only the cycle(s) aborted before their last 
BRDY and the cycles not yet run are restarted after BUFF is 
negated. Thus, system logic must keep track of all cycles in the 
locked operation that have completed before the assertion of 
BUFF and must continue the locked operation immediately 
after BUFF is negated, except that if a writeback is pending 
when BUFF is negated, the writeback takes precedence over 
the restarting of the aborted cycles in the locked operation. 

Figure 5-18 shows the effect of BUTT intervening in a locked 
read-write pair of bus cycles. The example begins with the 
read, while LUCK is asserted. System logic asserts BUFF while 
the processor is asserting ADS for the write, causing the pro- 
cessor to abort the write and float its bus in the next clock. 
Another bus master must wait two clocks after the assertion of 
BUFF before driving its first bus cycle, because the processor 
does not float its outputs until one clock after the assertion of 
BUFF. 

When system logic relinquishes the bus by negating BUFF, the 
processor almost immediately drives the bus again, with LUCK 
asserted, and restarts the aborted write access by asserting 
ADS as early as one clock after BUFF is negated. 

System logic should ensure that the processor results for inter- 
rupted and uninterrupted locked cycles are consistent. That is, 
system logic must guarantee that the memory accessed by the 
processor is not modified during the time another bus master 
controls the bus. 
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Figure 5-18. Locked Operation with BUFF Intervention 
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Interrupt 

Acknowledge 

Operation 



Figure 5-19A shows system logic asserting INTR during a burst 
read. The figure shows the resulting bus behavior, up to the 
start of the interrupt handler. When the processor recognizes 
an INTR interrupt at the next instruction-retirement bound- 
ary, the processor performs the following actions: 

■ Finish In-Progress Bus Cycle — In Figure 5-19A, a burst read is 
in progress when system logic asserts INTR. The processor 
supports only one such in-progress bus cycle. 

■ Flush Instruction Pipeline — This is not visible on the bus. 

■ Acknowledge Interrupt — The interrupt acknowledge opera- 
tion consists of a locked pair of reads, as shown in Table 
5-22. The first read is not functional (a protocol relic). The 
second read returns the interrupt vector in D7-D0. (The 
interrupt vector is an offset into an interrupt table.) System 
logic must return a BRDY in response to both cycles. The 
processor inserts at least one idle clock between the locked 
reads. 

■ System logic will typically not be able to determine the 
instruction boundary on which the processor recognizes 
INTR. Thus, as a practical matter, system logic should hold 
INTR asserted until the beginning of the interrupt acknowl- 
edge operation, or until there is some other evidence that 
the interrupt service routine has been entered (for exam- 
ple, the access to the interrupt-table address). 



Table 5-22. Interrupt Acknowledge Operation Definition 



Processor Outputs 


First Bus Cycle 


Second Bus Cycle 


D/C 


0 


0 


M/10 


0 


0 


W/R 


0 


0 


BE7-BE0 


EFh 


FEh (low byte enabled) 


A31-A3 


0 


0 


D63-D0 


(ignored) 


Interrupt vector expected from interrupt 
controller on D7-D0 



■ Disable Maskable Interrupts — The processor does this under 
certain conditions (see Section 5.2.32 on page 5-84 for 
details), and it is not visible on the bus. 

As shown in Figure 5-19B and Figure 5-19C, following the inter- 
rupt acknowledge operation and a quiet period during which 
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the processor executes housekeeping microcode, the processor 
prepares to service the interrupt by performing the following 
accesses on the bus: 

■ IDT Lookup — Using the interrupt vector and, in Protected 
mode, the base address of the interrupt descriptor table 
(IDT), from the interrupt descriptor table register (IDTR), 
the processor performs a read on the bus to look up the 8- 
byte IDT entry. In Figure 5-19B, this appears as a burst 
read, which is cached. 

■ GDT Lookup — Using the segment descriptor from the IDT, 
the processor performs another read of the global descrip- 
tor table (GDT) to look up the 8-byte code segment descrip- 
tor. This also appears as a burst read, which is cached. 
Alternatively, this read can access the local descriptor table 
rather than the global descriptor table. 

■ Write to Stack — As shown in Figure 5-19C the processor 
saves the EFLAGS, CS, and EIP registers on the stack. 
These saves appear as three single writes. 

■ Code Fetch for Interrupt Handler — Finally, using the base 
address from the GDT descriptor and the offset from the 
IDT descriptor, the processor locates the interrupt handler 
in the code segment (CS) and begins fetching the code in 
cacheable burst reads. 
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Figure 5-19C. Interrupt Acknowledge Operation Part 3 
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5.4.6 Special Bus Cycles 

The processor drives D/C = 0, M/IU = 0, and W/R = 1 to define a 
special bus cycle. The values of these cycle-definition signals 
are the same for all special cycles. Only BE7-BE0 and A31-A3 
differentiate among the special cycles, as shown in Table 5-23. 

This function of BE7-BE0 bears no relationship to the D63-D0 
data bus. It is particularly apparent in the case of the branch- 
trace message special bus cycle, during which the value of 
BE7-BE0 is DFh (1101_llllb) but, in contradiction to the byte- 
enable bits, the four bytes on D31-D0 carry valid data during 
both cycles of the operation. During the first cycle, D31-D0 
carries the EIP value of the source (branch) instruction. Dur- 
ing the second cycle, D31-D0 carries the EIP value of the 
branch-target instruction. 



Table 5-23. Encodings For Special Bus Cycles 



BE7-B0 


A3 1 -A3 


Special Bus Cycle 1 


Cause 


FEh 


...OOh 


Shutdown 


Triple fault 


FDh 


...OOh 


Cache Invalidation 


INVD instruction 


FBh 


...lOh 


Stop Grant 


STPCLK 


FBh 


...OOh 


Halt 


HLT instruction 


F7h 


...OOh 


Cache Writeback and Invalidation 


WBINVD instruction 


EFh 


...OOh 


FLUSH Acknowledge 


FLUSH 


DFh 


...OOh 


Branch-Trace Message 2 


Bit 5 = 1 and bits 3-1 =001 in the hardware 
configuration register (HWCR). See Section 
7.1 on page 7-3 for details. 


Notes: 

1. For all special bus cycles, D/C = 0, M/ID = 0 and W/R = 1. System logic must return BRDY In response to this cycle. 

2. The message In a branch-trace message special bus cycle is different in the AMD-K5 and Pentium processors. 
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Basic Special Bus Figure 5-20 shows a basic special bus cycle, which is defined 

Cycle during ADS by D/C = 0, M/IU = 0, and W/R = 1 and differenti- 

ated by BE7-BE0 and A31-A3. In this example, BE7-BE0 
= FBh and A31-A3 = 0, so it is the special cycle the processor 
generates after executing a HLT instruction. System logic must 
respond with BRDY. 

All special bus cycles serialize the pipeline. EWBE is not 
checked prior to running special bus cycles (all of which have 
W/R = 1), so EWBE has no effect on any special bus cycles. 
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Figure 5-20. Basic Special Bus Cycle (Halt Cycle) 
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Shutdown Cycle Figure 5-21 shows a shutdown and the special cycle that fol- 

lows. The processor enters shutdown when an interrupt or 
exception occurs during the handling of a double fault (vector 
8), which amounts to a triple fault. When the processor encoun- 
ters such a triple fault, it stops its activity on the bus and gen- 
erates the special bus cycle for shutdown (BE7-BE0 = FEh). 
System logic must respond with BRUY. 

System logic must assert NMI, INIT, RESET, or SMI to get the 
processor out of the Shutdown state. 
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Figure 5-21. Shutdown Cycle 
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FLUSH-Acknowledge Figure 5-22 shows the FEUSH-acknowledge special bus cycle, 
Cycle which the processor drives in response to system logic’s asser- 

tion of FLUSH. This example shows the processor completing 
other unrelated bus cycles following the assertion of FLUSH. 
These bus cycles are caused by the execution of instructions 
earlier in the pipeline, which are completing execution before 
the processor recognizes FLUSH on the next instruction-retire- 
ment boundary. 

FLUSH causes the processor to write back all modified lines in 
its data cache. Only one such writeback is shown in this exam- 
ple. After all writebacks complete, the processor invalidates 
all lines in both of its caches. Then, the processor generates the 
FEUSH-acknowledge special bus cycle (BF7-BF0 = EFh) to 
indicate that the writebacks and invalidation have completed. 
System logic must respond by asserting BRDY. 
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Figure 5-22. FLUSH-Acknowledge Cycle 
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Cache-Invalidation 
Cycle (INVD 
Instruction) 



Although the execution of INVD is not visible on the bus, the 
lack of activity on the bus as the microcode invalidates the 
lines in the internal cache can be seen. When all lines in both 
caches are invalidated, the processor drives the cache-invalida- 
tion special bus cycle (KE7-BEU = FDh). System logic must 
respond by asserting BRDY. When it does, the processor typi- 
cally begins driving one or more burst reads on the bus to refill 
its caches. 



Figure 5-23 shows the cache-invalidation special bus cycle, 
which the processor drives in response to the execution of the 
INVD instruction. The INVD instruction causes the processor 
to invalidate each line in its instruction and data caches. Modi- 
fied lines in the data cache are not written back. 
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Figure 5-23. Cache-Invalidation Cycle (INVD Instruction) 
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Cache-Writeback 
and Invalidation 
Cycle (WBINVD 
Instruction) 



Figure 5-24A and Figure 5-24B show the cache-writeback and 
invalidation special bus cycle, followed by the cache-invalida- 
tion special bus cycle. The processor drives these two special 
cycles after executing the WBINVD instruction. 

The execution of WBINVD causes the processor to invalidate 
each line in its instruction and data caches. If a data cache line 
is in the modified state, the line is written back immediately 
before being invalidated. During such writebacks, A31-A5 
defines the address of a 32-byte location in memory to which 
the modified cache line will be written back. After all modified 
lines are written back and all lines in both caches are invali- 
dated, the processor first drives the cache-writeback and inval- 
idation special bus cycle (BF7-BF0 = F7h) and then the cache- 
invalidation special bus cycle (BE7-BE0 = FDh). System logic 
must respond by asserting BRDY to each of the two special 
cycles as shown in Figure 5-24B. 
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Figure 5-24B. Cache-Writeback and Invalidation Cycle (WBINVD Instruction) Part 2 
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Branch-Trace Figure 5-25 shows the two branch-trace message special bus 

Message Cycles cycles that the processor generates for each taken branch 

when branch tracing is enabled as described in Section 7.6 on 
page 7-17. System logic can accumulate the address and data 
bus values for debugging or profiling. 

The processor drives these special bus cycles immediately 
after each taken-branch instruction is executed. Both special 
bus cycles have a BE7-BE0 = DFh, and system logic must 
respond by asserting BRDY to each of the cycles. The first 
cycle identifies the branch source, and the second identifies 
the branch target, as shown in Table 5-24. 



Table 5-24. Branch-Trace Message Special Bus Cycle Fields 



Signals 


First Special Bus Cycle 


Second Special Bus Cycle 


A3 1 


0 = first special bus cycle (source) 


1 = second special bus cycle (target) 


A30-A29 


not valid 


Operating Mode of Target: 
1 1 = Virtual-8086 Mode 
10 = Protected Mode 
01 = Not valid 
00 = Real Mode 


A28 


not valid 


Default Operand Size of Target Segment: 
1 - 32-Bit 
0 = 16-Bit 


A27-A20 


0 


0 


A19-A4 


Code segment (CS) selector of branch source 


Code segment (CS) selector of branch target 


A3 


0 


0 


D31-A0 


EIP of branch source. 


EIP of branch target. 
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Figure 5-25. Branch-Trace Message Cycle 
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5.4.7 Mode Transitions, Reset, and Testing 

System logic can control the system-management, clocking, 
and initialization states of the processor with SMI, STPCLK, 
INIT, and RESET. The following examples shows the proces- 
sor’s response to some of the signals. 

Transition from Figure 5-26A and Figure 5-26B shows the transition from one of 

Normal Execution to the processor’s normal operating modes (Real, Protected, or 
SMM Virtual-8086 mode) to System Management Mode (SMM). Sys- 

tem logic causes this transition by asserting SMI. 

Upon recognizing an SMI interrupt at the next instruction- 
retirement boundary, the processor performs the following 
actions: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. This is not visible on the bus. 

2. Complete In-Progress Cycle — If the processor had begun a 
bus cycle when SMI was asserted, the processor completes 
the bus cycle and waits until the system asserts the last 
expected BRDY and also asserts EWBE. In Figure 5-26A, a 
burst read is shown completing after SMI is asserted. 

3. Acknowledge — After sampling EWBE asserted, the proces- 
sor asserts SMIACT to acknowledge the interrupt. This is 
visible on the bus after SMI is recognized. At that point, sys- 
tem logic must ensure that all memory accesses during 
SMM are to the SMM memory space. 

4. Save Processor State — The processor saves its state in the 
SMM state-save area. These saves appear at the far right of 
the example in Figure 5-26B. 

5. Disable Interrupts and Debug Traps — The processor disables 
maskable interrupts by clearing the interrupt flag (IF) in 
EFFAGS, disables NMI interrupts, clears the trap flag (TF) 
in EFFAGS, and clears the DR7-DR6 debug control and sta- 
tus registers. This is not visible on the bus. 

6. Service Interrupt — The processor jumps to the entry point of 
the SMM service routine at the SMM base physical address, 
whose default is 0003_8000h in SMM memory. 

For details on SMM, see Section 6.3 on page 6-23. 



Bus Cycle Timing 



5-189 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 




Figure 5-26A. Transition from Normal Execution to SIMM Part 1 
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Figure 5-26B. Transition from Normal Execution to SIMM Part 2 
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Stop-Grant and Stop- 
Clock States 



Figure 5-27A and Figure 5-27B show the processor’s transition 
from normal execution to the Stop-Grant state, then to the 
Stop-Clock state, and finally back to normal execution. The 
series of transitions begins when system logic asserts STPCLK. 
Upon recognizing a STPCLK interrupt at the next instruction- 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates all instructions 
remaining in the pipeline. This is not visible on the bus. 

2. Complete In-Progress Cycle — If the processor had begun a 
bus cycle or locked operation when STPCLK was asserted, 
the processor completes the bus cycle and waits until the 
system asserts the last expected BRDY and also asserts 
EWBE. If no bus cycle is in progress, system logic must 
assert EWBE at the same time as, or at sometime after, it 
asserts STPCLK. In Figure 5-27A, a burst read is shown 
completing after STPCLK is asserted. 

3. Stop-Grant Cycle — After sampling both EWBE asserted, the 
processor drives a Stop-Grant special bus cycle. This cycle 
is identified by D/C = 0, M/IU = 0, W/R = 1, BE7-BE0 = FBh 
and A31-A3 = lOh. System logic must respond by asserting 
BRDY. This is visible on the bus, near the middle of Figure 
5-27A. 

4. Stop Internal Clock — When system logic returns BRDY for 
the Stop-Grant special bus cycle, the processor stops its 
internal clock and floats D63-D0 and DP7-DP0. This is on 
the bus between Figure 5-27A and Figure 5-27B immedi- 
ately after the BRDY of the Stop-Grant special bus cycle. 

5. (Optional) Stop Bus Clock — After returning BRDY in 
response to the Stop-Grant special bus cycle, power-man- 
agement logic can transition to the Stop-Clock state by stop- 
ping CLK while STPCLK is held asserted. 

STPCLK must be held asserted throughout the Stop-Grant and 
(if entered) Stop-Clock states. Figure 5-27B shows the processor 
resuming normal execution after system logic negates STP- 
CLK. 

For details on clock control, see Section 6.4 on page 6-33. 
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Figure 5-27A. Stop-Grant and Stop-Clock Modes Part 1 
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Figure 5-27B. Stop-Grant and Stop-Clock Modes Part 2 
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INIT-lnitiated 
Transition from 
Protected Mode to 
Real Mode 

INIT is typically asserted in response to a BIOS interrupt that 
writes to an I/O port. This is often, for example, in response to 
the operator’s pressing Control-Alt-Del. The BIOS writes to a 
port (such as port 64h in the keyboard controller) that asserts 
INIT. INIT is also used to support 286 software that must 
return to Real mode after accessing extended memory in Pro- 
tected mode. The 286 processor does not have an INIT input — 
a transition from Protected mode to Real mode can only be 
made on the 286 processor by asserting RESET. With the INIT 
signal, however, the operating system can cause the transition 
through a BIOS interrupt without loss of cache contents or 
floating-point state. 

Upon recognizing an INIT interrupt at the next instruction- 
retirement boundary, the processor performs the following 
actions, in the order shown: 

1. Flush Pipeline — The processor invalidates the instruction 
pipeline and TLB. This is not visible on the bus. 

2. Reinitialize — The processor reinitializes the general-pur- 
pose and system registers to their reset values. This is also 
not visible on the bus, except as an extended period of inac- 
tivity. 

3. Jump To BIOS — The processor jumps to the BIOS at address 
FFFF_FFFOh, the same entry point used after RESET. This 
jump is visible on the far-right side of Figure 5-28 as a burst 
code read. 



Figure 5-28 shows an example in which the operating system 
writes to an I/O port, causing system logic to assert INIT. The 
assertion of INIT starts an extended microcode sequence that 
terminates with a code fetch from the Reset location. 
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Figure 5-28. INIT-lnitiated Transition from Protected Mode to Real Mode 
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System Design 



This chapter summarizes topics that may be of help to system 
board designers. The discussions touch on the design of mem- 
ory, cache, System Management Mode (SMM), clock control 
(power management), and a few other topics. Many of the 
details that relate to this subject are also covered in Chapter 5, 
which describes the processor’s signals and bus cycles not only 
from the processor’s view, but also from the system’s view. 

Throughout this chapter, the term clock refers both to the pro- 
cessor’s internal clock and to the bus clock (CLK). Thus, each 
type of clock is explicitly differentiated in the descriptions 
that follow. 

6.1 Memory 



The processor can be configured for memory bus speeds of 50, 
60, or 66 MHz. Main memory can be built from Page-mode or 
EDO (extended data out) DRAM. On a 66-MHz bus, the read- 
cycle time for a page hit in EDO DRAM is 7- 2-2-2 (7 clocks for 
the first transfer and 2 clocks for each remaining transfer) and 
10-2-2-2 for a page miss. The read-cycle time for a page-hit 
Page-mode DRAM at 66 MHz is 7-4-4-4 and 10-4-4-4 for a page 
miss. On a 50-MHz bus, there is no change in timing for EDO 
DRAM but Page-mode DRAM timing becomes 6-3-3-3 for a 
page hit and 8-3-3-3 for a page miss. 
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6.1.1 



Memory Map 

Figure 6-1 shows a typical physical memory map for a DOS- 
based desktop system after DOS boots. Various regions of this 
memory map to RAM or ROM on the motherboard and adapter 
boards. The processor hardware imposes only two constraints 
on the physical memory map implemented by system hard- 
ware — the boot address at FFFF_FFFOh, which is accessed 
when RESET or INIT is asserted, and the default addresses for 
SMM. However, other physical memory mapping requirements 
are imposed by BIOS, the operating system, and the specific 
hardware implemented for the system. In general, the conven- 
tions for hardware memory mapping for DOS-based desktop 
systems include the following: 

■ Memory-decoder aliasing of boot ROM space 

■ Cacheable vs. noncacheable address spaces 

■ SMM memory address space (optional) 

Each of these issues is summarized briefly in the sections that 
follow. 
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Figure 6-1. Typical Desktop-System BIOS Memory Map 
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6.1.2 Memory-Decoder Aliasing of Boot ROM Space 

The processor boots in Real mode at address FFFF_FFFOh. 
However, because the boot ROM space must be accessed after 
the first far jump in the processor’s Real mode, which gener- 
ates 20-bit addresses in the space below 1 Mbyte, the address 
decoder typically aliases the 16-Kbyte physical boot ROM 
space located between FFFF_FFFFh and FFFF_C000h to the 
top of the high memory space, between 000F_FFFFh and 
000F_C000h, as shown in Figure 6-1. 

This reset-address behavior is due to the special way in which 
segment translation is performed in the x86 architecture when 
RESET or INIT is asserted. Normally, a Real-mode 16-bit seg- 
ment selector is shifted left 4 bits to form the segment base, 
and then added to the 16-bit offset to produce a 20-bit address. 
Thus, F000:FFF0 in the selector:offset format becomes a seg- 
ment base of 000F_0000h added to an offset of 0000_FFF0h, 
yielding the physical address 000F_FFF0h. When RESET or 
INIT is asserted, however, the left-shift is not done and the 
high 16 address bits are all set to 1, yielding the physical 
address FFFF_FFFOh. Thereafter, address translation only 
begins to work in the normal Real-mode manner when the first 
far jump is executed. This jump loads the code-segment regis- 
ter with a 16-bit segment selector, and this selector-load causes 
the address-translation mechanism to begin working in its nor- 
mal Real-mode manner. 

The system-logic address decoder must make this behavior 
transparent to software by aliasing the physical address 
FFFF_FFFOh to the physical address 000F_FFF0h. As stated 
above, it normally does this by aliasing the entire 16-Kbyte 
block between FFFF_FFFFh and FFFF_C000h to between 
000F_FFFFh and 000F_C000h. 

6.1.3 Cacheable and Noncacheable Address Spaces 

When the instruction or data caches are enabled, the processor 
can fill them with any information found in the system-defined 
cacheable address space — including code and data for applica- 
tion programs, BIOS, the operation system and its system-level 
data structures — except that the processor does not fill its 
instruction or data caches with page directory or page table 
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entries because these data structures are cached only in CR3 
and the TLBs. 

System logic normally defines the cacheable address space by 
implementing external registers which BIOS or other system 
software initializes during boot with the cacheable (or non- 
cacheable) ranges of the address space. Lookups in these regis- 
ters are then used by system logic to control the state of the 
KEN and WB/WT input signals. KEN controls the caching of 
memory reads for both the instruction and data caches, and 
WB/WT (together with the PWT bits written by the operating 
system) controls the ME SI state of cacheable read misses and 
write hits in the data cache. 

Most or all of the high memory address range, which lies 
between 640 Kbyte and 1 Mbyte, is typically specified as non- 
cacheable by system logic. BIOS ROM is typically hardware- 
aliased to addresses in this region, and BIOS uses some of the 
RAM in this region to address locations that should not be 
cached, such as memory-mapped I/O ports (video, disk, net- 
work, and other devices). Thus, system logic typically does not 
assert KEN during accesses to high memory. 

System logic can, of course, drive KEN so as to specify any 
other areas of memory as non-cacheable, although this is nor- 
mally not done. 

6.1.4 SMM Memory Space and Cacheability 

If the optional System Management Mode (SMM) is imple- 
mented, system logic must ensure that, during SMM, all mem- 
ory accesses are to the SMM memory space rather than to main 
memory. In general, system designs that do not overlap the 
address space of SMM memory and main memory are simpler 
to design and may perform better. Section 6.3 on page 6-23 
summarizes the details of SMM. This section deals only with 
memory usage in SMM. 

Figure 6-2 shows the default map of the SMM memory area. It 
consists of a 64-Kbyte area, between 0003_0000h and 
0003_FFFFh, of which the top 32 Kbytes (0003_8000h and 
0003_FFFFh) must be populated with RAM. The SMM service- 
routine entry point is located at 0003_8000h. 
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During boot, the address decoder must allow BIOS to address 
the SMM memory area in the main memory address space 
without entering into the SMM mode in order to initialize it 
with configuration parameters and the SMM service routine. 
Thereafter, the BIOS typically remaps the area from its default 
location in low memory to high or extended memory, as shown 
in Figure 6-1. After the remapping by BIOS, the address 
decoder must allow only the processor to access the SMM mem- 
ory area. Other bus masters must be prevented from accessing 
it, unless the system design specifically calls for such access. 
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System logic controls the cacheability of SMM memory with 
KEN in the same way that it controls the cacheability of mem- 
ory space. If SMM memory is to be non-cacheable, KEN must 
be held negated from when SMI is asserted until SM1ACT is 
negated. If SMM memory is to be cacheable, KEN must be 
asserted for cacheable read cycles. 
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The cacheability of SMM memory has both advantages and dis- 
advantages. By caching SMM memory, the advantage of faster 
repetitive accesses is offset by delays due to overwriting cache 
lines that may otherwise be reusable after returning from 
SMM. If the program that was running prior to entry into SMM 
ran out of the cache, and the same program continued to run 
after the return from SMM, the processor would need to refill 
the caches with the same information after returning from 
SMM. If an SMM routine frequently accesses the same loca- 
tions, the delays due to cache refills and writeback-invalidates 
may be worthwhile. But if an SMM routine seldom accesses the 
same locations, the speed of returning and continuing on with 
the prior program might be improved by not caching SMM 
memory. 

If SMM memory space overlaps main memory space that is 
cacheable, FLUSH must be asserted when SMI is asserted so 
that memory accesses in SMM do not hit locations cached from 
main memory. If SMM memory is to be cacheable, FLUSH 
must also be asserted with SMI when entering SMM, and the 
SMM service routine must execute the WBINVD instruction to 
invalidate the caches just prior to executing the RSM instruc- 
tion, which returns the processor from SMM. The use of 
FLUSH or WBINVD adds potentially significant time to the 
entering and leaving of SMM. 



6.2 Cache 



Systems with multiple bus masters that share cacheable mem- 
ory require methods for controlling access to the bus and con- 
trolling the coherency of shared memory. The sections below 
summarize certain principles and methods used by system 
logic, in concert with software, to maintain the coherency of 
the processor’s level-1 (or LI) on-chip caches and optional 
level-2 (or L2) external cache. 

The internal architecture of the processor’s LI instruction and 
data caches is described in Section 2.3 on page 2-13. The oper- 
ating system writes the cache disable (CD) and not- 
writethrough (NW) bits in CRO to enable and disable caching, 
independent of hardware. Thereafter, the operating system 
may write the PCD and PWT bits in the page directory and 
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page table entries to control caching properties for specific 
physical pages. The PCD and PWT bits control the state of the 
PCD and PWT output signals, which system logic can use to 
control L2 caching. 



6.2.1 L2 Cache 

To improve system performance, an L2 cache can be added 
between the processor and main memory. The L2 cache can be 
implemented for 3-2-2-2 bursts using 15-ns asynchronous 
SRAM on a 60-MHz or 66-MHz bus. Faster bursts can be imple- 
mented with synchronous SRAM. 9-ns SSRAM can achieve 3-1- 
1-1 bursts at 66 MHz and 10-ns SSRAM can achieve 2-1-1-1 
bursts at 50 MHz. 

Most system designs that implement an L2 cache do so using 
(a) an L2 cache that is significantly larger than the combined 
sizes of the LI caches, (b) L2 cache lines that are at least as 
wide as LI cache lines (32 bytes or more), and (c) cache-line 
fills that follow the principle of inclusion, which says that any 
line in the LI cache is guaranteed to be in the L2 cache. 

The first principle (L2 cache bigger) guarantees that the L2 
cache will have data that is not already in the LI cache. The 
second principle (L2 cache line size greater or equal to LI 
cache line size) can simplify and speed up transfers from the 
L2 cache to the LI cache. The third principle (inclusion) can 
simplify and speed up cache-coherency signaling for inquire 
cycles — if an inquire cycle misses in the L2 cache, the system 
can safely assume it is not in the LI cache without having to 
query the processor directly. 

6.2.2 Cacheability and Cache-State Control 

The PCD bits maintained by the operating system are a deter- 
mining factor in the state of the processor’s CACHE output sig- 
nal for each bus cycle. CACHE indicates the processor’s intent 
to drive a read or write cycle as a burst cycle. The signal is only 
asserted for reads that the operating system determines to be 
cacheable, and for writebacks of modified lines. These write- 
backs can be caused by inquire cycles, internal snoops, the 
FLUSH signal, the WBINVD instruction, or cache-line replace- 
ments. CACHE is not asserted for cache hits that are 
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6.2.3 



writethroughs, which are driven as single writes rather than 
burst writes. 

From the system’s viewpoint, the cacheability of bus cycles is 
controlled by the KEN and WB/WT inputs, as described in Sec- 
tion 6.1.3 on page 6-4. During reads, system logic can use the 
assertion of CACHE to initiate a table lookup of cacheable 
addresses. Such lookups are not normally necessary during 
writebacks, because the location (having already been cached) 
is known to be cacheable and KEN has no effect on the proces- 
sor during writes (only during reads). 

The ME ST state of a cache-line fill (read miss) or a write hit to a 
shared line is determined by the states of the PWT bits and the 
WB/WT input signal. The MESI-state transitions for reads and 
writes are given in Table 2-2 on page 2-19. Complete descrip- 
tions of the signals that control cacheability and cache coher- 
ency are given on the following pages: 

■ CACHE — Section 5.2.15 on page 5-49 

■ EADS — Section 5.2.20 on page 5-58 

■ HIT — Section 5.2.25 on page 5-70 

■ HITM — Section 5.2.26 on page 5-72 

■ INV — Section 5.2.33 on page 5-88 

■ KEN — Section 5.2.34 on page 5-89 

■ PCD — Section 5.2.39 on page 5-99 

■ PWT — Section 5.2.43 on page 5-105 

■ WB/WT — Section 5.2.56 on page 5-133 

Writethrough vs. Writeback Coherency States 

The terms writethrough and writeback apply to two related con- 
cepts in a read/write cache like the processor’s LI data cache 
or an L2 cache. The following conditions apply to both the 
writethrough and writeback modes: 

■ Memory Writes — There is a relationship between memory 
writes and their concurrence with cache updates: 

• A memory write that occurs concurrently with a cache 
update to the same location is a writethrough. 
Writethroughs are driven as single cycles on the bus. 
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• A memory write that occurs after a previous cache up- 
date to the same location is a writeback. Writebacks are 
driven as burst cycles on the bus. 

■ Coherency State — There is a relationship between ME ST 
coherency states and writethrough-writeback coherency 
states of lines in the cache: 

• shared MESI lines are in the writethrough state 

• modified and exclusive MESI lines are in the writeback 
state 

Table 2-2 on page 2-19 gives an overview of cache-access states 
from the viewpoint of both memory writes and coherency 
state. Chapter 5 deals with memory writes. This section deals 
with the coherency state of cache lines. 

Typically, system logic participates in the coherency control of 
individual data-cache lines during read misses and write hits to 
shared lines by driving WB/WT as shown in Tables 5-17 and 5-18 
on page 5-135. The PWT bit also enters into this control, but it 
is written by the operating system rather than system logic. 
Alternatively, system logic can force the on-chip data cache to 
statically observe a writethrough or a writeback protocol by 
tying WB/WT as follows: 

■ Writethrough Protocol — Tie WB/WT Low 

■ Writeback Protocol — Tie WB/WT High 

In the writethrough protocol, a cache line is either in the 
shared or invalid state. All write hits to shared lines in the data 
cache also cause l-to-8-byte writethroughs to memory. Thus, in 
writethrough cache lines, the MESI protocol is not fully 
observed — the line never transitions to the exclusive or modi- 
fied MESI states. In the writeback protocol, by contrast, a 
cache line can be in the shared, exclusive, modified, or invalid 
MESI state. Write hits only cause writethroughs to memory if 
the hit is to a shared line. Writebacks can be caused by inquire 
cycles, internal snoops, the FLUSH signal, the WBINVD 
instruction, or cache-line replacements. 

The advantages and disadvantages of these modes are as fol- 
lows: 

■ Writethrough Protocol: 
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• Repetitive writes to the same location are slower than in 
writeback mode. 

• No updates to the data cache are hidden from the sys- 
tem. 

• When returning from SMM with SMM memory cache- 
able, there is no need to write back modified lines in the 
data cache, so the mode transition may be faster. (Both 
caches, however, must be invalidated.) 

■ Writeback Protocol: 

• Repetitive writes to the same location are faster than in 
writethrough mode. 

• Updates that hit exclusive or modified lines in the data 
cache are hidden from the system. 

• When returning from SMM, in which SMM memory is 
cacheable, modified lines in the data cache must be writ- 
ten back before invalidating both caches, so the mode 
transition may be slower. 

In single-processor systems with no other caching master, WB / 
WT is typically tied High. This allows the processor to cache all 
cacheable reads in the exclusive state, and all cacheable writes 
update only the cache. In systems with multiple caching mas- 
ters, WB/WT can be generated after inquire cycles to all other 
caching masters by the logical OR of HIT from all of the mas- 
ters. This allows the processor to cache reads in the exclusive or 
modified state only if no other master has a copy. 

The write-once protocol, as described in Section 6.2.6 on page 
6-19, combines the system visibility features of pure 
writethrough and writeback protocols. While the writeback 
function can support higher performance in systems with a sin- 
gle caching master, the writethrough function is required for 
certain transitions in the write-once protocol in systems with 
multiple caching masters. 

Inquire Cycles 

System logic maintains coherency between external caching 
devices and the processor’s internal caches by driving inquire 
cycles to the processor during shared-memory accesses by 
other caching masters. Inquire cycles are often called snoops or 
invalidations, but these terms are too general to clearly differ - 
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entiate the function of an inquire cycle from the functions of 
snoops and invalidations that work and/or are initiated in quite 
different ways (see the preface for a short list of definitions). 
For example, the AMD-K5 and Pentium processors support 
only inquire cycles and internal snoops to their LI cache. They 
do not support continuous address bus watching. 

The processor responds to inquire cycles by looking up the 
inquire address in its physical tags. The physical-tag lookups 
are done in parallel with the linear-tag lookups that support 
program execution, so inquire cycles do not normally affect 
processor performance. Even when inquire cycles hit modified 
lines, which require writebacks to memory, only the proces- 
sor’s use of the bus is potentially affected. It can normally con- 
tinue to operate out of its cache during a writeback. 

Inquire cycles are initiated with EADS, INV, and an inquire 
address on A31-A5. In response, the processor asserts HIT if 
the inquire cycle address matches the address of a valid line in 
the instruction or data cache, or it asserts both HIT and HITM 
if the address matches a modified line in the data cache. If 
HITM is asserted, the processor writes the modified line back to 
memory. If INV was asserted with EADS, a hit invalidates the 
line. If INV was negated with EADS, a hit leaves the line in the 
shared state, or transitions it from the modified to shared state. 
On the AMD-K5 processor, the maximum inquire or invalida- 
tion rate with inquire cycles is one every two clocks, because 
HIT and HITM change state two clocks after EADS, and EADS 
can be asserted in the same clock in which HITM is negated. 

The MESI-state transitions for inquire cycles, internal snoops, 
and cache invalidations are given in Table 2-3 on page 2-20 and 
Table 5-11 on page 5-71. 

System logic typically drives inquire cycles to the processor 
during memory accesses by another bus master. If the proces- 
sor has a look-through L2 cache, inquire cycles need be driven 
to the processor only when a prior inquire cycle hits in the pro- 
cessor’s L2 cache, or during line replacements in the proces- 
sor’s L2 cache. To implement inquire cycles to the processor or 
L2 cache for every memory access by another caching master, 
system logic can generate EADS using the equivalent of ADS 
from the other caching master. 
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Inquire cycle logic in systems with look-aside caches can be 
simplified by monitoring only HITM and ignoring HIT. This 
works because the resulting state of a hit line is determined 
only by the state of INV during the inquire as follows: 

■ If INV is negated during a hit, the hit line — whether shared, 
exclusive or modified — transitions to the shared state. Thus, 
the inquiring master can safely cache the same data in the 
shared state without knowing whether the inquire cycle hit 
in the processor’s cache (and thus, without system logic 
monitoring HIT). 

■ If INV is asserted during a hit, the hit line — whether shared, 
exclusive or modified — transitions to the invalid state. For 
modified lines, the invalidation occurs after a writeback. 

■ If the inquire cycle misses, irrespective of the state of INV, 
the inquiring master can cache the target data in the shared 
state, although it will not have enough information to cache 
that line in the exclusive state (this requires that HIT be 
monitored). 

Lookaside caches must implement a signal with which to 
inform the memory controller that a processor access or an 
inquire cycle hit the L2 cache, so as to disable the memory 
from responding. A version of HIT can be implemented for this 
purpose. 

Inquire cycle logic in systems with a look-through L2 cache 
normally monitor both HIT and HITM from the processor, 
because such systems often implement the write-once cache 
protocol. This protocol requires caching in the exclusive state at 
certain transitions, and the exclusive state can only be identi- 
fied if both HIT and HITM are monitored. 

Bus Arbitration for Inquire Cycles 

Before running an inquire cycle, system logic must obtain con- 
trol of the address bus by asserting AHOLD, BUFF, or HOLD. 
These signals provide access to the bus with differing condi- 
tions and speed. 

In most systems, the choices are between BUFF and AHOLD. 
Due to its slow response time, HOLD is usually considered only 
when backward compatibility with prior-generation sub- 
systems requires it or when the integrity of in-progress bus 



6-14 



System Design 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



BUFF Arbitration 



cycles is of paramount importance. Support for BUFF is usu- 
ally needed to resolve potential deadlock problems that arise 
as a result of inquire cycles, and if BOFF is supported, there is 
usually no reason to support HOLD. The sections that follow 
further describe these relative advantages and disadvantages. 

BOFF obtains control of the full bus (address and data) in the 
next clock, intervening in any in-progress bus cycle if neces- 
sary. It provides the fastest response of the three bus-hold 
inputs. The processor floats its outputs in the next clock after 
the assertion of BOFF. Thus, the signal can also be used not 
only for inquire cycles but also to resolve deadlock between 
two bus masters during inquire cycles. 

BUFF is useful, and often necessary, in both single-bus and 
multiple-bus systems. Because of its ability to help resolve 
deadlock during shared-memory accesses to cached locations, 
it is required in virtually all systems with multiple caching 
masters. For example, if Master A controls the bus and 
attempts to write a memory location that is cached by Master B 
in a modified state, a shared L2 controller could drive an 
inquire cycle to Master B, forcing a writeback. But Master B 
cannot write back until Master A is off the bus. In this case, the 
L2 controller could use HUM from Master B to gate the asser- 
tion of BOFF to Master A. 

System logic typically drives separate BOFF signals to each 
bus master in the system. The assertion by system logic of 
BOFF to a shared L2 cache for an inquire cycle need not inter- 
fere with the processor’s continued operation out of its LI 
cache. In addition, the assertion by system logic of BUFF to a 
look-through L2 cache for an inquire cycle need not interfere 
with the processor’s continued accesses to that L2 cache. 

Figure 6-3 shows an example of BOFF in a system with two 
caching masters — a processor and another caching master — 
sharing the processor bus. A typical sequence for inquire 
cycles that hit a modified line in the processor’s cache might be 
as follows: 

1. The other master (or system logic) asserts BOFF to the pro- 
cessor. 

2. The other master (or system logic) drives an inquire cycle 
(represented by EADS) to the processor. 
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3. The processor responds with HUM to system logic. 

4. System logic asserts BUFF to the requesting master. (HITM 
from the processor can be used to generate BUFF.) 

5. The other master negates BUFF to the processor so that the 
processor can write back its modified line to main memory 
and the shared L2 cache. 




A configuration in which both caching masters were on oppo- 
site sides of a shared L2 look-through cache would have some- 
what similar operations, except that the L2 cache controller 
would do much of the signalling ascribed to system logic in Fig- 
ure 6-3. 
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AHOLD’s sole function is to support inquire cycles. The asser- 
tion of AHOLD by system logic only gets control of the address 
bus, leaving the data bus available to the processor for the 
completion of an in-progress bus cycle. If an inquire cycle hits 
a modified line while AHOLD is asserted, the writeback can 
occur while AHOLD is either asserted or negated. 

AHOLD is useful primarily in systems with multiple buses and 
multiple bus masters, where operations can occur on the sepa- 
rate buses independently and in parallel, and system logic 
would drive separate AHOLD signals to each caching master. 
This configuration occurs, for example, if the processor shares 
its bus only with a look-through L2 cache, and other caching 
masters work in parallel on a system bus that is isolated by sys- 
tem logic from the L2 cache controller. Figure 6-4 shows such a 
design. 

A typical sequence for inquire cycles that hit modified lines in 
the processor’s cache might be as follows: 

1. The master on the system bus requests access to memory. 

2. System logic responds by asserting BUFF to the processor’s 
L2 cache controller. 

3. System logic drives an inquire cycle (represented by EADS) 
to the L2 controller. 

4. The L2 controller responds with HITM to system logic 
(assuming the addressed location is cached by the L2). 

5. System logic asserts BUFF to the requesting master on the 
system bus. (HITM from the L2 controller can be used to 
generate BUFF to the other master.) 

6. The L2 controller asserts AHULD to the processor. 

7. The L2 controller drives an inquire cycle (represented by 
EADS) to the processor. 

8. The processor responds with HUM to the L2 controller, 
indicating that the processor may have a later copy of the 
location than does the L2 cache. 

9. System logic negates BUFF to the L2 cache controller so 
that the processor can write back its modified line to mem- 
ory and the L2 cache. 
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HOLD Arbitration System logic can use the HOLD (request) and HLDA (acknowl- 

edge) protocol to gain control of the address and data buses. 
Like BUFF, HOLD/HLDA gains control of both the address and 
data buses but only after the processor completes any in- 
progress bus cycle or a sequence of cycles, like a locked cycle. 
However, unlike BUFF, the HULD/HLDA protocol cannot 
resolve deadlock. In systems where deadlock can occur BUFF 
must be used, and there is no need to support HULD/HLDA. 

6.2.6 Write-Once Protocol 

Among the several write protocols that can be implemented by 
the LI and L2 caches, the write-once protocol is of special 
interest for systems in which the processor has an L2 cache on 
a separate bus from other caching masters. In such designs, the 
write-once protocol allows caching masters to simultaneously 
cache shared copies of data until one of the masters writes to 
that location, at which time the writing master can have the 
data exclusively and other caching masters must invalidate 
their copies. The protocol allows other masters to determine 
whether the processor has a modified line in its LI cache by 
driving an inquire cycle to the L2 cache, and it allows other 
masters, via inquire cycles, to intervene in the processor’s 
exclusive use of the data. 

Figure 6-5 shows an example. System logic drives separate WB/ 
WT input signals to the LI and L2 cache. During line fills and 
writes to the LI cache, the protocol then works as follows: 

1. During a read miss, the processor fills a line in the LI. At 
the same time, system logic (or the L2) fills a line in the L2 
with the same data, and drives the WB/WT input Low 
(writethrough) to both the LI and L2. This leaves the LI 
and L2 caches as follows: 

LI cache line in the shared state 
L2 cache line in the shared state 

2. During the first write to that line, the processor updates the 
shared line in the LI and L2, and writes through to memory. 
At the same time, system logic drives the LI WB/WT input 
Low (writethrough) and the L2 WB/WT input High (write- 
back). This leaves the LI and L2 caches as follows: 

LI cache line in the shared state 
L2 cache line in the exclusive state 
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The writethrough to memory must be accompanied by an 
invalidation of this line in any other caching master’s cache. 

3. During the second write to that line, the processor updates 
its shared line and writes through to the exclusive line of the 
L2 cache. At the same time system logic drives the LI WB / 
WT input High (writeback), the L2 WB/WT input can also 
be driven but has no effect. This leaves the LI and L2 
caches as follows: 

LI cache line in the exclusive state 
L2 cache line in the modified state 

(If the design of the L2 permits line transitions directly 
from the shared to modified state, the state transitions in 
Step 2 can be skipped.) 

4. During the next write to that line, the processor updates its 
exclusive line. The WB/WT input has no effect. This leaves 
the LI and L2 caches as follows: 

LI cache line in the modified state 
L2 cache line in the modified state 

5. During all subsequent writes to that line, the processor sim- 
ply updates its modified line. 

Inquire cycles to the L2 cache that occur between Steps 1 and 3 
get a HIT but not a HITM, thus avoiding the need to drive 
simultaneous or subsequent inquire cycles to the LI cache. 
These inquire cycles to the L2 cache are done in parallel with 
the processor’s LI and L2 accesses, so they do not reduce the 
processor’s performance when it works out of its caches. How- 
ever, inquire cycles to the L2 cache that occur after Step 3 get 
a HITM. In these cases, the L2 cache drives a subsequent 
inquire cycle to the LI cache, which may have updated a modi- 
fied copy after the last update to the L2 cache. These inquire 
cycles to LI are done in parallel with the processor’s own LI 
accesses, but they will block the processor’s access to the L2 
cache. 
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6.2.7 Cache Invalidations 

The term invalidation usually means one of the following 

things: 

■ Individual Cache Lines — Writebacks and/or invalidations of 
single lines in the instruction and data caches can be done 
with inquire cycles (driven by system logic) or internal 
snoops (initiated by the processor). These invalidations are 
described in Section 6.2.4 on page 6-12, in the section on 
Internal Snooping on page 2-22, and elsewhere throughout 
this manual. 

■ Entire Cache Contents — Writebacks and/or invalidations of 
the entire contents of the instruction and data caches can 
be done with the INVD or WBINVD instructions, or with the 
FLUSH signal. These invalidations are typically performed 
by the operating system or system logic during task or mode 
changes. The invalidations are described on pages 5-65 and 
5-180. 



The MESI-state transitions for cache invalidations are given in 
Table 2-3 on page 2-20. 

6.2.8 A20M Masking of Cache Accesses 

The processor samples A20M only in Real mode, and applies 
A20M masking to its linear cache tags, through which all pro- 
grams access the caches. Thus, assertion of A20M affects all 
program-generated cache addresses, including the following: 

■ Cache-line fills (caused by read misses) 

■ Cache writethroughs (caused by write misses or write hits 
to lines in the shared state) 

■ Cache accesses that occur while the processor does not con- 
trol the bus 

However, A20M does not mask writebacks or invalidations 
caused by the following actions, which are looked up only in 
the physical (not the linear) tags: 

■ Internal snoops 

■ Inquire cycle 

■ The FLUSH signal 
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■ The WBINVD instruction 

Asserting A20M masks Real-mode cache addresses even while 
the processor does not control the bus. Thus, if another master 
takes control of the bus and causes the assertion of A20M, this 
masks cache accesses occurring concurrently in the processor. 
However, it does not affect the correct execution of programs, 
because linear and physical addresses are identical in Real 
mode. 

The Pentium processor applies masking only to physical 
addresses, not to linear addresses. This difference between the 
AMD-K5 and Pentium processors of masking linear vs. physical 
addresses is not visible to software because linear and physical 
addresses are identical in Real mode, and the AMD-K5 proces- 
sor samples A20M only in Real mode. 

6.3 System Management Mode (SMM) 



SMM is an operating mode entered via an interrupt and per- 
formed by an interrupt service routine. It is designed for power 
management and other system control activities that can occur 
transparently to conventional operating systems like DOS and 
Windows. The code and data for SMM are stored in an SMM 
memory area that should be separate from main memory. 

The processor enters SMM when system logic asserts the SMI 
interrupt and the processor acknowledges it with SMIACT, at 
which point the processor saves its state and jumps to the SMM 
service routine. The processor returns from SMM when it exe- 
cutes the RSM (resume) instruction from within the SMM ser- 
vice routine. Upon return, the processor picks up where it left 
off in its prior operating mode, except that special return 
options are provided when the processor enters SMM from the 
Halt state or from a trapped I/O instruction, as described in the 
sections below. 

The sections below summarize the SMM state-save area, entry 
into and exit from SMM, and exceptions and interrupts in 
SMM. Section 6.1.4 on page 6-5 summarizes memory allocation 
and addressing in SMM. The SMI and SMIACT signals are 
described on pages 5-116 and 5-121, respectively. 
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6.3.1 Operating Mode and Default Register Values 

The software environment in SMM has the following features: 

■ Addressing as in Real mode 

■ 4-Gbyte segment limit 

■ Default 16-bit operand, address, and stack size, although 
instruction prefixes can override these defaults 

■ Control transfers that do not override the default operand 
size truncate the EIP to 16 bits 

■ Far jumps or calls cannot transfer control to a segment with 
a base address requiring more than 20 bits, as in Real mode 
segment-base addressing. 

■ A20M is not recognized (unlike the Pentium processor) 

■ Interrupt vectors use the Real-mode interrupt vector table 
(but see Section 6.3.8 on page 6-32) 

■ The IF flag in EFFAGS is cleared (INTR not recognized) 

■ The NMI interrupt is disabled 

■ The TF flag in EFFAGS is cleared (single-step traces dis- 
abled) 

■ Debug register DR7 is cleared (debug traps disabled) 

Figure 6-2 on page 6-7 shows the default map of the SMM mem- 
ory area. It consists of a 64-Kbyte area, between 0003_0000h 
and 0003_FFFFh, of which the top 32-Kbytes (0003_8000h and 
0003_FFFFh) must be populated with RAM. The default code- 
segment (CS) base address for the area — called the SMM Base 
Address — is at 0003_0000h. The top 512 bytes (0003_FFFFh to 
0003_FE00h) contain a fill-down SMM state-save area. The 
default entry point for the SMM service routine is at 
0003_8000h. 

Table 6-1 shows the initial state of registers when entering 
SMM. 
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Table 6-1. Initial State of Registers in SIMM 



Register 


Initial Contents 


Selector 


Base 


Attributes 


Limit 


CS 


3 0 0 0 h 


0003_0000h 

(see Section 6.3.4) 


1 6-bit, expand-up 


4 Gbytes 


DS 


OOOOh 


000 0_0 0 0 0 h 


1 6-bit, expand-up 


4 Gbytes 


ES 


OOOOh 


000 0_0 0 0 0 h 


1 6-bit, expand-up 


4 Gbytes 


FS 


OOOOh 


000 0_0 0 0 0 h 


1 6-bit, expand-up 


4 Gbytes 


GS 


OOOOh 


000 0_0 0 0 0 h 


1 6-bit, expand-up 


4 Gbytes 


SS 


OOOOh 


000 0_0 0 0 0 h 


1 6-bit, expand-up 


4 Gbytes 


General-Purpose 


Unmodified 


EFLAGS 


000 0_0 0 0 2 h 


EIP 


000 0_8 0 0 0 h 


CRO 


Bits 0, 2, 3, 31 cleared (PE, EM, TS, PG). Others are unmodified. 


CR4 


000 0_0 0 0 0 h 


GDTR 


Unmodified 


LDTR 


Unmodified 


IDTR 


Unmodified 


TR 


Unmodified 


DR7 


Unmodified 


DR6 


Undefined 



6.3.2 SMM State-Save Area 

When the processor acknowledges an SMI interrupt by assert- 
ing SMIACT, it saves its state in the 512-byte SMM state-save 
area shown in Table 6-2. The save begins at the top of the SMM 
memory area (SMM Base Address + FFFFh) and fills down to 
SMM base address + FEOOh. 

Table 6-2 shows the offsets in the SMM state-save area relative 
to the SMM base address. The SMM service routine can alter 
any of the read/write values in the state-save area. The con- 
tents of any reserved locations in the state-save area are not 
necessarily the same between the AMD-K5 processor and the 
Pentium or 486 processors. 
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Table 6-2. SMM State-Save Area Map 



Offset (hex) 


Contents 


Size (bits) 


Type 


FFFC 


CRO 


32 


read-only 


FFF8 


CR3 


32 


read-only 


FFF4 


EFLAGS 


32 


read/write 


FFFO 


EIP 


32 


read/write 


FFEC 


EDI 


32 


read/write 


FFE8 


ESI 


32 


read/write 


FFE4 


EBP 


32 


read/write 


FFEO 


ESP 


32 


read/write 


FFDC 


EBX 


32 


read/write 


FFD8 


EDX 


32 


read/write 


FFD4 


ECX 


32 


read/write 


FFDO 


EAX 


32 


read/write 


FFCC 


DR6 (FFFF_CFF3h) 


32 


read-only 


FFC8 


DR7 


32 


read-only 


FFC4 


TR 


1 6 (upper 16 reserved) 


read-only 


FFCO 


LDTR 


1 6 (upper 16 reserved) 


read-only 


FFBC 


GS 


1 6 (upper 16 reserved) 


read-only 


FFB8 


FS 


1 6 (upper 16 reserved) 


read-only 


FFB4 


DS 


1 6 (upper 16 reserved) 


read-only 


FFBO 


SS 


1 6 (upper 16 reserved) 


read-only 


FFAC 


CS 


1 6 (upper 16 reserved) 


read-only 


FFA8 


ES 


1 6 (upper 16 reserved) 


read-only 


FFA4 


I/O Trap Dword 


32 (See Section 6.3.6) 


read-only 


FFAO 


reserved 


32 


- 


FF9C 


I/O Trap EIP 


32 


read-only 


FF98 


reserved 


32 


- 


FF94 


reserved 


32 


- 


FF90 


IDT Base 


32 


read-only 


FF8C 


IDT Limit 


1 6 (upper 16 reserved) 


read-only 


FF88 


GDT Base 


32 


read-only 


FF84 


GDT Limit 


1 6 (upper 16 reserved) 


read-only 


Notes: 

1. Locations marked "reserved" may change in future processors. 

2. Writing locations marked as "read-only" has unpredictable results. 
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Table 6-2. SMM State-Save Area Map (continued) 



Offset (hex) 


Contents 


Size (bits) 


Type 


FF80 


TR Attributes 


1 2 (upper 20 reserved) 


read-only 


FF7C 


TR Base 


32 


read-only 


FF78 


TR Limit 


20 (upper 12 reserved) 


read-only 


FF74 


LDT Attributes 


1 2 (upper 20 reserved) 


read-only 


FF70 


LDT Base 


32 


read-only 


FF6C 


LDT Limit 


20 (upper 12 reserved) 


read-only 


FF68 


GS Attributes 


1 2 (upper 20 reserved) 


read-only 


FF64 


GS Base 


32 


read-only 


FF60 


GS Limit 


20 (upper 12 reserved) 


read-only 


FF5C 


FS Attributes 


1 2 (upper 20 reserved) 


read-only 


FF58 


FS Base 


32 


read-only 


FF54 


FS Limit 


20 (upper 12 reserved) 


read-only 


FF50 


DS Attributes 


1 2 (upper 20 reserved) 


read-only 


FF4C 


DS Base 


32 


read-only 


FF48 


DS Limit 


20 (upper 12 reserved) 


read-only 


FF44 


SS Attributes 


1 2 (upper 20 reserved) 


read-only 


FF40 


SS Base 


32 


read-only 


FF3C 


SS Limit 


20 (upper 12 reserved) 


read-only 


FF38 


CS Attributes 


1 2 (upper 20 reserved) 


read-only 


FF34 


CS Base 


32 


read-only 


FF30 


CS Limit 


20 (upper 12 reserved) 


read-only 


FF2C 


ES Attributes 


1 2 (upper 20 reserved) 


read-only 


FF28 


ES Base 


32 


read-only 


FF24 


ES Limit 


20 (upper 12 reserved) 


read-only 


FF20 


reserved 


32 


- 


FF1C 


reserved 


32 


- 


FF18 


reserved 


32 


- 


FF14 


CR2 


32 


read-only 


FF10 


CR4 


32 


read-only 


FFOC 


I/O Restart ESI 


32 


read-only 


FF08 


I/O Restart ECX 


32 


read-only 


Notes: 

1. Locations marked "reserved" may change in future processors. 

2. Writing locations marked as "read-only" has unpredictable results. 
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Table 6-2. SIMM State-Save Area Map (continued) 



Offset (hex) 


Contents 


Size (bits) 


Type 


FF04 


I/O Restart EDI 


32 


read-only 


FF02 


Halt Restart Slot 


1 6 (See Section 6.3.5) 


read/write 


FFOO 


I/O Trap Restart Slot 


16 (See Section 6.3.7) 


read/write 


FEFC 


SMM Revision Identifier 


32 (See Section 6.3.3) 


read-only 


FEF8 


SMM Base Address 


32 (See Section 6.3.4) 


read/write 


FE00-FEF4 


reserved 


32 


- 


Notes: 

/. Locations marked "reserved" may change in future processors. 

2. Writing locations marked as "read-only" has unpredictable results. 



6.3.3 SMM Revision Identifier 

The SMM revision identifier at offset FEFCh in the SMM state- 
save area specifies the version of SMM and the extensions that 
are available on the processor. The SMM revision identifier 
fields are as follows: 

■ Bits 31-18 — reserved 

m Bit 17 — SMM base address relocation (always 1 = enabled) 

■ Bit 16 — I/O trap restart (always 1 = enabled) 

■ Bits 15-0 — SMM revision level = 0000 

These fields are the same as in the Pentium processor. Unlike 
the Pentium processor, however, the I/O trap restart and the 
SMM base address relocation functions are always enabled in 
the AMD-K5 processor and do not need to be specifically 
enabled. 

6.3.4 SMM Base Address 

During RESET, the processor sets the code-segment (CS) base 
address for the SMM memory area — the SMM Base Address — to 
its default, 0003_0000h. The SMM base address at offset FEF8 
in the SMM state-save area can be changed by the SMM ser- 
vice routine to any address that is aligned to a 32-Kbyte bound- 
ary. (Locations not aligned to a 32-Kbyte boundary cause the 
processor to enter the Shutdown state when executing the 
RSM instruction.) 
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If the SMM base address is rewritten, the processor saves its 
state at the new base address the next time SMM is entered, 
and each time thereafter, until RESET. The relocated 
addresses for the SMM memory will then be as follows: 

■ SMM base address — Default: 0003_0000h. Relocated: offset 
FEF8 in the SMM state-save area (see Table 6-2) 

■ Service Routine Entry Point — SMM base address + 8000h 

■ Top — SMM base address + FFFFh 

This SMM base address relocation feature is compatible with 
the Pentium processor’s analogous feature. The following 
pseudo-code implements a relocatable SMM base address in 



BIOS: 






beg i n 
/ 






l 

if SMI 

r 


Handler is to be 


Relocated the 


i 

set 


SMM Base Address 


(offset FEF8h 


resume 





} 

else 

{ 

SMM execution to begin at relocation area, 
resume 



end 

To relocate the SMM base address above the 1-Mbyte limit 
imposed by Real-mode segment addressing, use the address- 
override prefix to generate the offset in 32-bit registers. If the 
SMM base address is relocated to a block below 16 Mbytes, 
data in the DS segment (which has a segment base of 
0000_0000h) can be accessed by the following code: 

mov ebx , OOFExxxxh ; 64K segment from OOF E_0 00 0 h to 00FE_FFFFh 
mov ax, ds:[ebx] 
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6.3.5 Halt Restart Slot 

During entry into SMM, the halt restart slot at offset FF02h in 
the SMM state-save area specifies if SMM was entered from 
the Halt state. Before returning from SMM, the halt restart slot 
can be written by the SMM service routine to specify whether 
the return from SMM should take the processor back to the 
Halt state or to the instruction-execution state specified by the 
SMM state-save area. 

On entry into SMM, the halt restart slot is configured as fol- 
lows: 

■ Bits 15-1 — Undefined 

■ Bit 0 — Point of entry to SMM: 

1 = entered from Halt state. 

0 = not entered from Halt state 

Before return from SMM, the halt restart slot can be written 
as: 

■ Bits 15-1 — Undefined 

■ Bit 0 — Point of return from SMM 

1 = return to Halt state 

0 = return to state specified by SMM state-save area 

The fields of the halt restart slot are the same as in the Pen- 
tium processor auto halt restart slot. During entry into and exit 
from SMM, the processor writes or reads only bit 0 of the 16-bit 
value although the entire 16 bits can be read or written by the 
service routine. The Pentium-compatible pseudo-code for 
implementing the halt restart slot in BIOS is as follows: 

beg i n 
{ 

if return to Halt state then 
{ 

if SMI# during Halt state then 

set halt restart slot to OOFFh 

} 

lend 

If the return takes the processor back to the Halt state, the 
HLT instruction is not refetched, but the Halt special bus cycle 
is driven on the bus after the return. 
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6.3.6 I/O Trap Dword 

If the assertion of SMI is recognized on the boundary of an I/O 
instruction, the I/O trap dword at offset FFA4h in the SMM 
state-save area contains information about the instruction. The 
fields of the I/O trap dword are configured as follows: 

■ Bits 31-16 — I/O port address 

■ Bits 1 5-2 — reserved 

m Bit 1 — Valid I/O instruction (1 = valid, 0 = invalid) 

■ Bit 0 — Input or output instruction (1 = INx, 0 = OUTx) 

The I/O trap dword is related to the I/O trap restart slot, 
described below. Bit 1 of the I/O trap dword (the valid bit) 
should be tested if the I/O trap restart slot is to be changed. 

6.3.7 I/O Trap Restart Slot 

The I/O trap restart slot at offset FFOOh in the SMM state-save 
area specifies whether the assertion of SMI was recognized on 
the boundary of an I/O instruction, and if so, whether the 
trapped I/O instruction should be re-executed on return from 
SMM. This is sometimes called the I/O -instruction restart func- 
tion. Re-executing a trapped I/O instruction is useful, for exam- 
ple, if an I/O write to disk finds the disk powered down. The 
system logic monitoring such an access can assert SMI. Then 
the SMM service routine would query system logic, find a 
failed I/O write, take action to power-up the I/O device, enable 
the I/O trap restart slot feature, and return. 

The fields of the I/O trap restart slot are configured as follows: 

■ Bits 31-16 — reserved 

m Bits 15-0 — I/O instruction restart on return from SMM: 

OOOOh = execute the next instruction after the trapped I/O 
instruction 

OOFFh = re-execute the trapped I/O instruction 

The processor initializes the I/O trap restart slot to OOOOh upon 
entry into SMM. If SMM was entered due to a trapped I/O 
instruction, the processor indicates the validity of the I/O 
instruction by setting or clearing bit 1 of the I/O trap dword at 
offset FFA4 in the SMM state-save area, as described in Sec- 
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tion 6.3.6. The SMM service routine should test bit 1 of the I/O 
trap dword to determine the validity of the I/O instruction 
before writing the I/O trap restart slot. If the I/O instruction 
was valid, the SMM service routine can safely rewrite the I/O 
trap restart slot with the value OOFFh, which causes the proces- 
sor to re-execute the trapped I/O instruction when the RSM 
instruction is executed. If the I/O instruction was invalid, writ- 
ing the I/O trap restart slot has undefined results. If sequential 
SMI interrupts occur, the second entry into SMM will never 
have bit 1 of the I/O trap dword set, and the second SMM ser- 
vice routine should not rewrite the I/O trap restart slot. 

The pseudo-code for implementing I/O Trap Restart in BIOS is 
as follows: 

beg i n 
{ 

if I/O instruction needs to be restarted then 
{ 

if valid I/O instruction (test offset FFA4) then 
set I/O restart slot (offset FFOO) to OOFFh 

} 

} 

end 

During a simultaneous SMI I/O-instruction trap and debug 
breakpoint trap, the AMD-K5 processor first responds to the 
SMI and postpones writing the exception-related information 
to the stack until after the return from SMM via the RSM 
instruction. If debug registers DR3-DR0 are used in SMM, they 
must be saved and restored by the SMM software. The proces- 
sor automatically saves and restores DR7-DR6. If the I/O trap 
restart slot in the SMM state-save area is written with the 
value OOFFh when the RSM instruction is executed, the debug 
trap does not occur until after the I/O instruction is re-exe- 
cuted. 

6.3.8 Exceptions and Interrupts in SMM 

When SMM is entered, the processor disables both INTR and 
NMI interrupts. On both the AMD-K5 and Pentium processors, 
INTR interrupts are disabled by clearing the IF flag in 
EFLAGS. But the mechanism by which NMI interrupts are dis- 
abled and subsequently recognized differs between the 
AMD-K5 and Pentium processors. 
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During SMM, the Pentium processor does not respond to NMI 
until the beginning of its response to the first INTR or software 
interrupt (INTn) to occur after entering SMM. NMIs can thus 
be enabled by using a dummy interrupt. When an INTR or soft- 
ware interrupt is recognized, the processor first responds to a 
pending NMI interrupt before executing the first instruction of 
the INTR handler. By contrast, the AMD-K5 processor recog- 
nizes a pending NMI interrupt after returning (via the IRET 
instruction) from a prior interrupt. 

The same dummy interrupt used on the Pentium processor to 
enable NMI recognition during SMM works on the AMD-K5 
processor. The only difference is that the AMD-K5 processor 
responds to the NMI after the IRET of the dummy interrupt 
whereas the Pentium processor responds at the beginning of 
the dummy interrupt. All other exceptions and interrupts 
within SMM are fully compatible with those supported by the 
Pentium processor in SMM. 

The IF flag in EFLAGS is cleared automatically when the pro- 
cessor enters SMM, thus disabling maskable interrupts. The 
HLT instruction should not be executed in SMM without first 
setting the IF bit. 

Table 5-2 on page 5-8 and Table 5-3 on page 5-16 summarize the 
behavior of all interrupts in SMM. 

6.3.9 SMM Compatibility with Pentium Processor 

The differences in SMM functions between the AMD-K5 and 
Pentium processors are described in Section A. 5 on page A-12. 

6.4 Clock Control 



The processor’s consumption of power can be controlled by 
reducing the frequency of the processor and/or bus clocks 
when there is no computational or user activity. System logic 
initiates this control by asserting STPCLK, which causes the 
processor to complete any in-progress bus cycle and enter the 
Stop Grant state (processor’s internal clock stopped), from 
which system logic can subsequently transition the processor 
to its Stop Clock state (CFK stopped). These clock control func- 
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tions can be entered from any of the processor’s normal operat- 
ing modes (Real, Virtual-8086, or Protected mode), from 
system management mode (SMM), or from the Halt state. 

In typical PC systems that implement power control, the STP- 
CLK, CLK, and SMI signals are driven by external power man- 
agement logic that monitors activity on the address and cycle- 
definition signals. In a typical case, the power management 
logic may notice that, after having initiated SMM to power 
down one or more I/O devices, another several minutes have 
elapsed without activity. Power management logic can again 
assert SMI, the SMM service routine would obtain the relevant 
information and decide to power itself (the processor) down, 
and the decision would be communicated to the power man- 
agement logic, which would assert STPCLK to the processor 
and, optionally, stop driving CLK to the processor and other 
logic. For details on SMI and STPCLK, see pages 5-116 and 
5-122, respectively. 



6.4.1 State Transitions 

The five states in the processor’s clock-control protocol, as 
shown in Figure 6-6, are as follows: 

■ Normal Execution: Real mode, Virtual-8086 mode, Protected 
mode, or System Management Mode (SMM). In this state, 
all clocks run at full speed. 

■ Halt State 

m Stop Grant State 
m Stop Grant Inquire State 

■ Stop Clock State 

The sections below describe each of the four low-power states. 

6.4.2 Halt State 

The processor enters the Halt state from the normal operating 
modes (Real, Protected, or Virtual-8086) or SMM when it exe- 
cutes the HLT instruction. The processor leaves the Halt state 
and returns to its prior operating mode when RESET, SMI, 
INIT, NMI, or INTR is asserted. If STPCLK is asserted within 
the Halt state, the processor transitions to the Stop Grant 
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state, and it returns to the Halt state when STPCLK is negated. 
No processor registers are saved before entering the Halt state 
because the processor returns to the next unexecuted instruc- 
tion in program order when it returns to its prior operating 
mode. When the processor returns to the Halt state, the HLT 
instruction is not refetched but the processor drives the Halt 
special bus cycle on the bus after the return. 

Within the Halt state, the processor disables the majority of its 
internal clock distribution and (if STPCLK is asserted) the 
internal pullup resistor on STPCLK. However, its phase-lock 
loop still runs, its key internal logic is still clocked, most of its 
inputs and outputs retain their last state (except D63-D0 and 
DP7-DP0 which are floated), and it still responds to input sig- 
nals. 

The HLT instruction is commonly executed by modern UNIX- 
type operating systems as a method of entering an idle loop. 
The operating system sees that it has no pending processes, 
therefore nothing to execute, so it executes HLT. Entry into 
the Halt state achieves the same power-saving effect as entry 
into the Stop Grant state, but the method is simpler and faster. 
Entry into the Halt state requires only the execution of the 
HLT instruction, whereas entry into the Stop Grant state 
requires that system logic monitor system activity, assert STP- 
CLK, and decode the processor’s acknowledgment (potentially 
several clocks later) via the Stop Grant special bus cycle. 
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6.4.3 Stop Grant State 

The assertion of STPCLK causes the processor to enter the 
Stop Grant state. The processor can enter the Stop Grant state 
from the normal operating modes (Real, Protected, or Virtual- 
8086), SMM, or the Halt state. 

When STPCLK is negated, the processor returns to the mode 
from which it entered. If the processor entered the Stop Grant 
state from the Halt state, negation of STPCLK returns the pro- 
cessor to the Halt state. Otherwise, negation of STPCLK or 
assertion of RESET returns the processor to the normal operat- 
ing mode or SMM, from which it entered. If INIT is asserted in 
the Stop Grant state, the signal is latched and acted upon after 
STPCLK is negated. No processor registers are saved before 
entering the Stop Grant state because the processor returns to 
the next unexecuted instruction in program order when it 
returns to its prior operating mode. 

Within the Stop Grant state (as in the Halt state) the majority 
of the processor’s internal clock distribution and all internal 
pullup resistors are disabled. However, its phase-lock loop still 
runs, its key internal logic is still clocked, most of its inputs 
and outputs retain their last state (except D63-D0 and DP7- 
DPO which are floated), and it still responds to input signals. 

6.4.4 Stop Grant Inquire State 

An inquire cycle driven while the processor is in the Halt or 
Stop Grant state causes the processor to transition to the Stop 
Grant Inquire state. As for inquire cycles driven from any 
other state, system logic must assert AHOLD, BOFF, or HOLD 
to obtain the address bus before driving EADS, INV, and the 
inquire address. 

The processor responds normally to an inquire cycle by driving 
HITM and/or HIT and performing any necessary cache-state 
transition. If HITM is asserted, the processor drives a normal 
writeback (immediately if AHOLD is asserted, or delayed if 
BOFF or HLDA is asserted) and returns to the state from which 
it entered the Stop Grant Inquire state in the clock in which it 
negates HITM. If HITM is not asserted, the processor returns 
from the Stop Grant Inquire state to the state from which it 
entered, two clocks after EADS. 
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6.4.5 Stop Clock State 

The processor enters the Stop Clock state when system logic 
turns off CLK while STPCLK is asserted. This is the minimum- 
power state and it can only be entered from the Stop Grant 
state, after BRDY has been returned for the Stop Grant special 
bus cycle. In the Stop Clock state, the processor’s phase-lock 
loop and I/O buffers are disabled, except for the I/O buffers on 
CLK and the JTAG signals. System logic should not change the 
state of any signals, and the processor does not recognize any 
signal edges in the Stop Clock state. 

When CLK is restarted, the processor returns to the Stop Grant 
state, responds to inputs in the next clock, but cannot drive bus 
cycles until its phase-lock loop is synchronized. The latter 
takes several clocks (see the data sheet for this specification). 
The CLK can be driven with a different frequency and/or the 
bus-to-processor clock ratio can be changed on the BF input(s) 
upon restarting CLK. 

6.4.6 Clock Control Compatibility with Pentium Processor 

The differences in clock control functions between the 
AMD-K5 and Pentium processors are described in Section A. 5 
on page A-12. 

6.5 Power and Ground Design 



All of the processor input signals operate at 3 V except CLK, 
which can operate at 3 V or 5 V. Compatible 3-V chipsets are 
available. If your system operates at 5 V, chipsets that provide 
5-V to 3-V voltage translators are available, or you can provide 
the translators on your system board. (If you use voltage trans- 
lators, they must be fast enough to support your bus speed.) 

Due to the processor’s high clock frequency, the package sup- 
ports many copies of V cc and V ss to prevent power surges 
when multiple outputs change state simultaneously. In addi- 
tion, certain precautions must be taken with respect to the 
AHOLD input. If the processor has a pending bus cycle when 
AHOLD is negated, all of the address drivers turn on almost 
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immediately after AHOLD is negated. If the processor is also 
driving data with BRDY on the data bus at the same time, the 
processor then drives 96 bits simultaneously and ground- 
bounce spikes can occur. Such ground-bounce spikes can be 
avoided by following these two rules with respect to AHOLD: 

■ Do not negate AHOLD in the same clock that BRDY is 
asserted during a write cycle. 

■ Do not negate AHOLD in the same clock that ADS is 
asserted during a writeback. 

In addition to the above restrictions on driving AHOLD, the 
following general design recommendations apply to power con- 
nections between the processor and the system board: 

■ Connect all V cc pins to a V cc plane on your system board. 

■ Connect all V ss pins to a GND plane on your system board. 

■ Do not drive address and data buses into large capacitive 
loads at high frequencies. This can cause transient power 
surges. 

■ Decouple capacitance near the processor. 

■ Use low-inductance capacitors and circuit paths, and type 
X7R or better dielectric. 

■ Use capacitors specifically designed for PGA packages. 

■ Tie unused inputs High or Low. 

■ Leave no-connect (NC) pins unconnected. 

■ Connect active-Low inputs to V cc through a 20-kfl pullup 
resistor. This keeps the inputs in a known state while allow- 
ing them to be driven during tests. 

■ Connect active-High inputs to GND through a pulldown 
resistor. 

■ Keep trace lengths to a minimum. 
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6.6 Power-Up Requirements 

During power-up, CLK should be toggling and RESET should 
be asserted as V cc is ramping toward normal operating volt- 
age. Figure 6-7 shows this timing. After V cc and CLK reach 
specification, RESET must be asserted for a minimum of 1 ms 
to allow the phase-lock loop to synchronize. 
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6.7 Noise Reduction 



Circuit noise can be minimized by the following design rules: 

■ Clock Signal 

• Place the processor as close as possible to the clock 
source. 

• Route the CPUCLK signal on a single PCB layer. Do not 
use vias. 

• Guard-band the CPUCLK signal with twice the minimum 
pitch width to minimize unwanted cross talk. 

■ Capacitors 

• Place all capacitors as near as possible to the processor. 

• Connect the positive sides of all capacitors through vias 
directly to the processor power plane. 

• Connect the negative sides of all capacitors through vias 
to the ground plane. 

• Use tantalum 47 (iF and 1 (iF capacitors. 

• Use ceramic capacitors with low equivalent series resis- 
tance (ESR) ratings at high frequencies and a minimum 
voltage rating of 6 V for all other capacitor values. 

• Place some capacitors very near to the processor, prefer- 
ably on the inside perimeter of the processor socket. 

• Connect bypass capacitors on the top side of the PCB di- 
rectly to the processor’s power pins. 

■ Multilayer Printed-Circuit Boards 

• Use a minimum of four layers — one split power plane, 
one ground plane, two routing planes. 

■ Regulator Circuit 

• Use surface-mounted components placed as near as pos- 
sible to the processor. 

• Use at least three vias to the +5-V power plane for the in- 
put power connection. 

• Use at least three vias to the +3-V processor power plane 
for the output power connection. 

AMD recommends using a split power plane to isolate the pro- 
cessor from the rest of the motherboard. This approach 
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reduces noise without additional PCB planes. The split plane 
should be made from a portion of copper that is cut out and iso- 
lated from the PCB 5-V power plane. This cutout region sup- 
plies a separate power source for the processor and allows 
installation of bypass decoupling capacitors. The capacitors 
should be placed across the split power plane to provide signal- 
return termination. The processor power plane should overlap 
the output pin of the voltage regulator circuit to provide a low- 
impedance current path. 

The ground plane should never be split because it provides a 
low-impedance current sink and reference. Use generous 
decoupling to ensure that clean power is supplied to the pro- 
cessor. 

6.8 Thermal Design 



In virtually all system designs, the processor’s case tempera- 
ture must be kept cool with some type of heatsink device. Typ- 
ically, the heatsink is combined with an airflow device, such as 
a fan. In general, the trade-off is heat-sink size and cost versus 
airflow quantity and temperature. A small, low-cost heat sink 
requires more airflow than a larger, more efficient heat sink. 

Such cooling products are widely available. For detailed speci- 
fications and assistance is selecting a product, contact your 
AMD field application engineer or browse the AMD home page 
on the World Wide Web (see Section 6.9 for details). 

When gluing a heat sink to the processor case, follow these 
guidelines: 

■ Use thermal paste. This optimizes heat transfer. 

■ Apply the thermal paste in a thin, smooth, even layer across 
the entire processor package. Do not allow air gaps between 
the processor package and the heatsink. If air gap exits, the 
heatsink will be ineffective. 

In addition to the above guidelines for gluing heatsinks to the 
processor, observe the following general design guidelines to 
minimize the adverse effects of system-generated heat on the 
processor and other heat-sensitive system components: 
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■ Place the power supply as far away from the processor as 
possible. 

■ Place linear devices and regulators away from both the pro- 
cessor and the power supply. 

■ Place high-frequency L2-cache SRAM chips away from both 
the processor and the power supply. 

■ Check the specification for any TTL parts on the board for 
thermal considerations. 

6.9 Design Support and Peripheral Products 



AMD field application engineers (FAEs) can help you solve 
system design problems and select peripheral products that 
are compatible with the AMD-K5 processor. You can locate the 
FAE nearest you by contacting one of the AMD offices listed in 
this manual. You can also find support information on AMD’s 
World Wide Web pages. A list of available Web information is 
given at the AMD home page at the following address: 

http: //www. amd . com/ 
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Test and Debug 



The AMD-K5 processor has the following modes in which pro- 
cessor and system operation can be tested or debugged: 

■ Hardware Configuration Register (HWCR) — The HWCR is a 
model-specific register that contains configuration bits that 
enable cache, branch tracing, debug, and clock control func- 
tions. 

■ Built-In Self -Test (BIST) — Both normal and test access port 
(TAP) BIST. 

■ Output-Float Test — A test mode that causes the AMD-K5 
processor to float all of its output and bidirectional signals. 

■ Cache and TLB Testing — The Array Access Register (AAR) 
supports writes and reads to any location in the tag and 
data arrays of the processor’s on-chip caches and TLBs. 

■ Debug Registers — Standard 486 debug functions, with an I/O- 
breakpoint extension. 

■ Branch Tracing — A pair of special bus cycles can be driven 
immediately after taken branches to specify information 
about the branch instruction and its target. The Hardware 
Configuration Register (HWCR) provides support for this 
and other debug functions. 

■ Functional Redundancy Checking — Support for real-time 
testing using two processors in a master-checker relation- 
ship. 
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■ Test Access Port (TAP) Boundary-Scan Testing — The JTAG 
test access functions defined by the IEEE Standard Test 
Access Port and Boundary-Scan Architecture (IEEE 1149.1- 
1990) specification. 

■ Hardware Debug Tool (HDT) — The hardware debug tool 
(HDT), sometimes referred to as the debug port or Probe 
mode, is a collection of signals, registers, and processor 
microcode that is enabled when external debug logic drives 
R/5 Low or loads the AMD-K5 processor’s Test Access Port 
(TAP) instruction register with the USEHDT instruction. 

The test-related signals and their descriptions include the fol- 
lowing: 

■ FLUSH— Page 5-65 

■ FRCMC— Page 5-68 

■ IERR— Page 5-78 

■ INIT— Page 5-81 

■ PRDY— Page 5-103 

■ R/5— Page 5-107 

■ RESET— Page 5-109 

■ TCK— Page 5-127 

■ TDI— Page 5-128 

■ TDO— Page 5-129 

■ TMS— Page 5-130 

■ TEST— Page 5-131 

The sections that follow provide details on each of the test and 

debug features. 



7-2 



Test and Debug 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



7.1 Hardware Configuration Register (HWCR) 

The Hardware Configuration Register (HWCR) is a model-spe- 
cific register (MSR) that contains configuration bits that 
enable cache, branch tracing, debug, and clock control func- 
tions. The WRMSR and RDMSR instructions access the HWCR 
when the ECX register contains the value 83h, as described in 
Section 3.3.5 on page 3-33. Figure 7-1 and Table 7-1 show the 
format and fields of the HWCR. 



31 876543210 




Figure 7-1 . Hardware Configuration Register (HWCR) 
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Table 7-1. Hardware Configuration Register (HWCR) Fields 



Bit 


Mnemonic 


Description 


Function 


31-8 


- 


- 


reserved 


7 


DDC 


Disable Data Cache 


Disables data cache. 

0 = enabled, 1 = disabled. 


6 


DIC 


Disable Instruction Cache 


Disables instruction cache. 
0 = enabled, 1 = disabled. 


5 


DBP 


Disable Branch Prediction 


Disables branch prediction. 
0 = enabled, 1 = disabled. 


4 


- 


- 


reserved 


3-1 


DC 


Debug Control 


Debug control bits: 

000 Off (disable HWCR debug control). 

001 Enable branch-tracing messages. See Section 
7.6 on page 7-17. 

010 reserved 

01 1 reserved 

100 reserved 

101 reserved 

1 1 0 reserved 

1 1 1 reserved 


0 


DSPC 


Disable Stopping 
Processor Clocks 


Disables stopping of internal processor clocks in the 
Halt and Stop Grant states. 

0 = enabled, 1 = disabled. 


Notes: 

Documentation on the Hardware Debug Tool (HOT) is available from AMD under a nondisclosure agreement. 
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7.2 Built-In Self Test (BIST) 



The processor supports the following types of built-in self-test: 

■ Normal BIST — A built-in self-test mode typically used to 
test system functions after RESET 

■ Test Access Port (TAP) BIST — A self-test mode started by the 
TAP instruction, RUNBIST 

All internal arrays except the TLB are tested in parallel by 
hardware. The TLB is tested by microcode. Unlike the Pentium 
processor, the AMD-K5 processor does not report parity errors 
on IERR for every cache or TLB access. Instead, the AMD-K5 
processor fully tests its caches during the BIST. EADS should 
not be asserted during a BIST. The processor accesses the phys- 
ical tag array during BISTs, and these accesses can conflict 
with inquire cycles. 

7.2.1 Normal BIST 

The normal BIST is invoked if INIT is asserted at the falling 
edge of RESET. The BIST runs tests on the internal hardware 
that exercise the following resources: 

■ Instruction cache: 

• Linear tag directory 

• Instruction array 

• Physical tag directory 

■ Data cache: 

• Linear tag directory 

• Data array 

• Physical tag directory 

■ Entry-point and instruction-decode PLAs 

■ Microcode ROM 

■ TLB 

The BIST runs a linear feedback shift register (LFSR) signa- 
ture test on the microcode ROM in parallel with a March C test 
on the instruction cache, data cache, and physical tags. This is 
followed by the March C test on the TLB arrays and then an 



Built-In Self Test (BIST) 



7-5 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



LFSR signature test on the PLA, in that order. Upon comple- 
tion of the PLA test, the processor transfers the test result 
from an internal Hardware Debug Test (HDT) data register to 
the EAX register for external access, resets the internal micro- 
code, and begins normal code fetching. 

The result of the BIST can be accessed by reading the lower 9 
bits of the EAX register. If the EAX register value is 
0000_0000h, the test completed successfully. If the value is not 
zero, the non-zero bits indicate where the failure occurred, as 
shown in Table 7-2. The processor continues with its normal 
boot process after the BIST completes, whether the BIST 
passed or failed. 



Table 7-2. BIST Error Bit Definition in EAX Register 



Bit Number 


Bit Value 


0 


1 


31-9 


No Error 


Always 0 


8 


No Error 


Data path 


7 


No Error 


Instrudion-cache instrudions 


6 


No Error 


Instrudion-cache linear tags 


5 


No Error 


Data-cache linear tags 


4 


No Error 


PLA 


3 


No Error 


Microcode ROM 


2 


No Error 


Data-cache data 


1 


No Error 


Instruction cache physical tags 


0 


No Error 


Data-cache physical tags 



7.2.2 Test Access Port (TAP) BIST 

The TAP BIST performs all of the functions of the normal 
BIST, up to and including the PLA signature test, in the exact 
manner as the normal BIST. However, after the PLA test, the 
test result is not transferred to the EAX register. 

The TAP BIST is started by loading and executing the RUN- 
BIST instruction in the test access port, as described in Section 
7.8 on page 7-19. When the RUNBIST instruction is executed, 
the processor enters into a reset mode that is identical to that 
entered when the RESET signal is asserted. Upon completion 
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of the TAP BIST, the result remains in the BIST result register 
for shifting out through the TDO signal. The TRST signal must 
be asserted or the TAP instruction must be changed in order to 
exit TAP BIST and return to normal operation. 

7.3 Output-Float Test 



The Output-Float Test mode is entered if FLUSH is asserted 
before the falling edge of RESET. This causes the processor to 
place all of its output and bidirectional signals in the high- 
impedance state. In this isolated state, system board traces and 
connections can be tested for integrity and driveability. The 
Output-Float Test mode can only be exited by asserting RESET 
again. 

On the AMD-K5 and Pentium processors, FLUSH is an edge- 
triggered interrupt. On the 486 processor, however, the signal 
is a level-sensitive input. 

7.4 Cache and TLB Testing 



Cache and TLB testing is often done by the BIOS or operating 
system during power-up. These arrays can be tested using the 
Array Access Register (AAR). The following tests can be per- 
formed: 

■ Data Cache — 8-Kbyte, 4-way, set associative 

• Data array 

• Linear-tag array 

• Physical-tag array 

■ Instruction Cache — 16-Kbyte, 4-way, set associative 

• Instruction array 

• Linear-tag array 

• Physical-tag array 

• Valid-bit array 

• Branch-prediction bit array 
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■ 4-Kbyte TLB — 128-entry, 4-way, set associative 

• Linear-tag array 

• Page array 

■ 4-Mbyte TLB — 4-entry, fully associative 

• Linear-tag array 

• Page array 

7.4.1 Array Access Register (AAR) 

The 64-bit Array Access Register (AAR) is a model-specific 
register (MSR) that contains a 32-bit array pointer, which iden- 
tifies the array location to be tested, and 32 bits of array test 
data to be read or written. The WRMSR and RDMSR instruc- 
tions access the AAR when the ECX register contains the value 
82h, as described in Section 3.3.5 on page 3-33. Figure 7-2 
shows the format of the AAR. 



31 




0 




Array Pointer 






(Contents of EDX) 




31 




0 




Array Data 






(Contents of EAX) 





MSR 
82 h 



Figure 7-2. Array Access Register (AAR) 



To read or write an array location, perform the following steps: 

1. ECX — Enter 82h into ECX to access the 64-bit AAR. 

2. EDX — Enter a 32-bit array pointer into EDX, as shown in 
Figures 7-3 through 7-8 (top). 

3. EAX — Read or write 32 bits of array test data to or from 
EAX, as shown in Figures 7-3 through 7-8 (bottom). 
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7.4.2 Array Pointer 

The array pointers entered in EDX (Figures 7-3 through 7-8, 
top) specify particular array locations. For example, in the 
data- and instruction-cache arrays, the way (or column) and set 
(or index) in the array pointer specifies a cache line in the 4- 
way, set-associative array. The array pointers for data-cache 
data and instruction-cache instructions further specify a dword 
location within that cache line. In the data cache, this dword is 
32 bits of data. In the instruction cache, this dword is two 
instruction bytes plus their associated pre-decode bits. For the 
4-Kbyte TFB, the way and set specify one of the 128 TFB 
entries. For the 4-Mbyte TFB, one of only four entries is speci- 
fied. 

Bits 7-0 of every array pointer encode the array ID, which iden- 
tifies the array to be accessed, as shown in Table 7-3. To sim- 
plify multiple accesses to an array, the contents of EDX is 
retained after the RDMSR instruction executes (EDX is nor- 
mally cleared after a RDMSR instruction). 



Table 7-3. Array IDs in Array Pointers 



Array Pointer 
Bits 7-0 


Accessed Array 


EOh 


Data Cache: Data 


Elh 


Data Cache: Linear Tag 


ECh 


Data Cache: Physical Tag 


E4h 


Instruction Cache: Instructions 


E 5 h 


Instruction Cache: Linear Tag 


EDh 


Instruction Cache: Physical Tag 


E 6 h 


Instruction Cache: Valid Bits 


E 7 h 


Instruction Cache: Branch-Prediction Bits 


E8h 


4-Kbyte TLB: Page 


E9h 


4-Kbyte TLB: Linear Tag 


EAh 


4-Mbyte TLB: Page 


EBh 


4-Mbyte TLB: Linear Tag 
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7.4.3 Array Test Data 

EAX specifies the test data to be read or written with the 
RDMSR or WRMSR instruction (see Figures 7-3 through 7-8). 
For example, in Figure 7-3 (top) the array pointer in EDX spec- 
ifies a way and set within the data-cache linear tag array (Elh 
in bits 7-0 of the array pointer) or the physical tag array (ECh 
in bits 7-0 of the array pointer). If the linear tag array (Elh) 
were accessed, the data read or written includes the tag and 
the status bits. The details of the valid fields in EAX are shown 
in Appendix A of the AMD-K5 Processor Software Development 
Guide, order# 20007. 



EDX: Array Pointer 



31 30 29 28 27 19 18 13 12 8 7 0 



0 0 


Way 


00000000 


Set 


0 0 0 0 0 


Array ID 
(Elh, ECh) 



EAX: Test Data 



31 28 27 



0 



0 0 0 0 



Valid Bits 



(Elh) Linear Tag 



31 



23 22 



0 



000000000 



Valid Bits 



(ECh) Physical Tag 



Figure 7-3. Test Formats: Data-Cache Tags 
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EAX: Test Data 



31 



0 




Figure 7-4. Test Formats: Data-Cache Data 
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EAX: Test Data 



31 20 19 0 

000000000000 Valid Bits 



(E5h) Linear Tag 






31 


21 20 


0 



00000000000 Valid Bits 



(EDh) Physical Tag 






31 


19 18 


0 



0000000000000 Valid Bits 



(E6h) Valid Bits 






31 


19 18 


0 



0000000000000 Valid Bits 

(E7h) Branch-Prediction Bits 

Figure 7-5. Test Formats: Instruction-Cache Tags 



7-12 



Test and Debug 
























1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



EDX: Array Pointer 



31 30 29 28 27 20 19 12 11 9 8 7 0 



0 0 


Way 


00000000 


Set 


Opcode 

Bytes 


0 


Array ID 
(E4h) 



EAX: Test Data 



31 26 25 



0 



0 0 0 0 0 0 



Valid Bits 



(E4h) Instruction Bytes 



Figure 7-6. Test Formats: Instruction-Cache Instructions 



Cache and TLB Testing 



7-13 











EAX: Test Data 



31 22 21 0 

0000000000 Valid Bits 



(E8h) 4-Kbyte Page and Status 






31 


20 19 


0 



000000000000 Valid Bits 

(E9h) 4-Kbyte Linear Tag 

Figure 7-7. Test Formats: 4-Kbyte TLB 
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EDX: Array Pointer 



31 30 29 28 27 8 7 0 



0 0 


Entry 


00000000000000000000 


Array ID 
(EAh, EBh) 



EAX: Test Data 



31 



12 11 



00000000000000000000 



(EAh) 4-Mbyte Page and Status 



Valid Bits 



31 



15 14 



0 



00000000000000000 



Valid Bits 



(EBh) 4-Mbyte Linear Tag 



Figure 7-8. Test Formats: 4-Mbyte TLB 



Cache and TLB Testing 
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7.5 Debug Registers 



The processor implements the standard debug functions and 
registers — DR7-DR6 and DR3-DR0 (often called DR7-DR0) — 
that are available on the 486 processor, plus an I/O breakpoint 
extension. 

7.5.1 Standard Debug Functions 

The debug functions make the processor’s state visible to 
debug software through four debug registers (DR3-DR0) that 
are accessed by MOV instructions. Accesses to memory 
addresses can be set as breakpoints in the instruction flow by 
invoking one of two debug exceptions (interrupt vectors 1 or 3) 
during instruction or data accesses to the addresses. The debug 
functions eliminate the need to embed breakpoints in code and 
allow debugging of ROM as well as RAM. 

For details on the standard 486 debug functions and registers, 
see the AMD documentation on the Am486® processor or other 
commercial x86 literature. 

7.5.2 I/O Breakpoint Extension 

The processor supports an I/O breakpoint extension for break- 
points on I/O reads and writes. This function is enabled by set- 
ting bit 3 of CR4, as described in Section 3.1 on page 3-2. When 
enabled, the I/O breakpoint function is invoked by the follow- 
ing: 

■ Entering the I/O port number as a breakpoint address (zero- 
extended to 32 bits) in one of the breakpoint registers, 
DR3-DR0 

■ Entering the bit pattern, 10b, in the corresponding 2-bit 
R/W field in DR7 

All data breakpoints on the AMD-K5 processor are precise, 
including those encountered in repeated string operations, 
which trap after completing the iteration on which the break- 
point match occurs. 
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Enabled breakpoints slow the processor somewhat. When a 
data breakpoint is enabled, the processor disables its dual- 
issue load/store operations and performs only single-issue load/ 
store operations. When an instruction breakpoint is enabled, 
instruction issue is completely serialized. 

7.5.3 Debug Compatibility with Pentium Processor 

The differences in debug functions between the AMD-K5 and 
Pentium processors are described in Section A. 7 on page A-15. 

7.6 Branch Tracing 



Branch tracing is enabled by writing bits 3-1 with 001b and set- 
ting bit 5 to 1 in the Hardware Configuration Register 
(HWCR), as described in Section 7.1 on page 7-3. When thus 
enabled, the processor drives two branch-trace message spe- 
cial bus cycles immediately after each taken branch instruc- 
tion is executed. Both special bus cycles have a BE7-BE0 
encoding of DFh (1101_llllb). The first special bus cycle iden- 
tifies the branch source, the second identifies the branch tar- 
get. The contents of the address and data bus during these 
special bus cycles are shown in Table 7-4. 

The branch-trace message special bus cycles are different for 
the AMD-K5 and Pentium processors, although their BE7-BE0 
encodings are the same. 



Branch Tracing 
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Table 7-4. Branch-Trace Message Special Bus Cycle Fields 



Signals 


First Special Bus Cycle 


Second Special Bus Cycle 


A3 1 


0 = first special bus cycle (source) 


1 = second special bus cycle (target) 


A30-A29 


not valid 


Operating Mode of Target: 
1 1 = Virtual-8086 Mode 
10 = Protected Mode 
01 = not valid 
00 = Real Mode 


A28 


not valid 


Default Operand Size of Target Segment: 
1 = 32-bit 
0 = 1 6-bit 


A27-A20 


0 


0 


A19-A4 


Code Segment (CS) selector of Branch Source. 


Code Segment (CS) selector of Branch Target. 


A3 


0 


0 


D31-D0 


EIP of Branch Source. 


EIP of Branch Target. 



7.7 Functional-Redundancy Checking 



If FRCMC is asserted at RESET, the processor enters Func- 
tional-Redundancy Checking mode as the checker, and reports 
checking errors on the IERR output. If FRCMC is negated at 
RESET, the processor operates normally, although it also 
behaves as the master in a functional-redundancy checking 
arrangement with a checker. 

In the Functional-Redundancy Checking mode, two processors 
have their signals tied together. One processor (the master) 
operates normally. The other processor (the checker) has its 
output and bidirectional signals (except for TDO and IERR) 
floated to detect the state of the master’s signals. The master 
controls instruction fetching and the checker mimics its behav- 
ior by sampling the fetched instructions as they appear on the 
bus. Both processors execute the instructions in lock step. The 
checker compares the state of the master’s output and bidirec- 
tional signals with the state that the checker itself would have 
driven for the same instruction stream. 
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Errors detected by the checker are reported on the 1ERR out- 
put of the checker. If a mismatch occurs on such a comparison, 
the checker asserts IERR for one clock, two clocks after the 
detection of the error. Both the master and the checker con- 
tinue running the checking program after an error occurs. No 
action other than the assertion of IERR is taken by the proces- 
sor. On the AMD-K5 processor, the IERR output is reserved 
solely for functional-redundancy checking. No other errors are 
reported on that output. 

Functional-redundancy checking is typically implemented on 
single-processor, fault-monitoring systems (which actually 
have two processors). The master processor runs the opera- 
tional programs and the checker processor is dedicated 
entirely to constant checking. In this arrangement, the test of 
accurate operation consists solely of reporting one or more 
errors. The particular type of error or the instruction causing 
an error is not reported. The arrangement works because the 
processor is entirely deterministic. Speculative prefetching, 
speculative execution, and cache replacement all occur in 
identical ways and at identical times on both processors if their 
signals are tied together so that they run the same program. 

The Functional-Redundancy Checking mode can only be 
exited by the assertion of RESET. Functional-redundancy 
checking cannot be performed in the Hardware Debug Tool 
(HDT) mode. The assertion of ERCMC is not recognized while 
PRDY is asserted. 

7.8 Boundary-Scan Test Access Port (TAP) 



The boundary-scan Test Access Port (TAP) — originally pro- 
posed by the Joint European Test Action Group (JETAG) and, 
later, Joint Test Action Group (JTAG) — is an IEEE standard 
that defines synchronous scanning test methods for complex 
logic circuits, such as boards containing a microprocessor. The 
AMD-K5 processor supports the full TAP standard defined in 
the IEEE Standard Test Access Port and Boundary-Scan Architec- 
ture (IEEE 1149.1-1990) specification. 



Boundary-Scan Test Access Port (TAP) 
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The TAP consists of the following: 

■ Test Access Port (TAP) Controller — A synchronous, finite 
state machine that decodes the inputs on the TMS signal to 
control a sequence of test operations. 

■ Instruction Register (IR) — Accepts serially shifted instruc- 
tions from the TDI input. The instructions select the test or 
debug operation to be performed, the Test Data Register 
(TDR) to be accessed, or both. 

■ Test Data Registers (TDRs) — Used to process the test data. 
Each TDR is addressed by an instruction in the Instruction 
Register (IR). The processor includes the following TDRs: 

• Boundary Scan Register (BSR) — Contains cells connected 
to all of the processor’s input and output signals as well 
as cells for I/O float control. It allows serial data to be 
written into or read from the processor boundary. The 
register is controlled with the EXTEST and SAMPLE in- 
structions. 

• Device Identification Register (DIR) — Contains the codes 
for manufacturer's identification, part number, and ver- 
sion. 

• Bypass Register (BR) — A path between TDI and TDO 
used to transfer test data to and from other board com- 
ponents when no test operation is being performed by 
the processor. 

• Hardware Debug Tool Register (HDTR) — Selected by the 
USEHDT instruction to connect TDI and TDO, allowing 
HDT instructions to be executed. 

• Built-In Self-Test Result Register (BISTRR) — Selected by 
the RUNBIST instruction to connect TDI and TDO, al- 
lowing the result of executing the RUNBIST to be 
shifted out after the completion of BIST. 

■ The test signals are as follows: 

• TCK — The clock for all TAP testing 

• TDI — Input test data and instructions 

• TDO — Output data 

• TMS — Test functions and sequence of test changes 

• TRST — Test reset 

Boundary-scan testing uses shift registers in boundary scan 

cells located between the processor’s internal logic and I/O 



7-20 



Test and Debug 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



buffers to control and observe the behavior of signals at each 
pin. The boundary scan cells form a serial shift-register chain, 
called a Boundary Scan Register (BSR), around the processor’s 
internal logic. Test data is shifted through the boundary-scan 
chain by a test program. If all the components on a board 
implement this boundary-scan architecture, a single serial 
path can be used to test component interconnections. 

Parallel output registers are fed by the shift registers. Parallel 
data is loaded into the shift register when the TAP controller 
exits the capture state ( capture_DR or capture_IR) . The shift 
registers then shift data from TDI to TDO in the shift state 
(shift_DR or shift_IR). The parallel output registers hold the 
current data while new data is shifted into the shift registers. 
The output registers are updated when the controller exits the 
update state (update_DR or update_IR) . 

The sections below describe only those aspects of the IEEE 
standard that are implemented uniquely by the AMD-K5 pro- 
cessor. For a description of the IEEE-mandatory TAP functions 
and the IEEE optional functions implemented by the AMD-K5 
processor, see the IEEE Standard Test Access Port and Boundary- 
Scan Architecture (IEEE 1149.1-1990) specification. 

7.8.1 Device Identification Register 

The format of the Device Identification Register (DIR) is 
shown in Table 7-5. The fields include the following values: 

■ Version Number — This is incremented by AMD manufactur- 
ing for each major revision of silicon. 

■ Bond Option — The two bits of the bond option depend on 
how the part is bonded at the factory. 

■ Part Number — This identifies the specific processor model. 

■ Manufacturer — This is actually only 11 bits (11-1). The 
least-significant bit, bit 0, is always set to 1, as specified by 
the IEEE standard. 



Table 7-5. Test Access Port (TAP) ID Code 



Version 
(Bits 31-28) 


Bond Option 
(Bits 27-26) 


Part Number 
(Bits 25-12) 


Manufacturer 
(Bits 11-0) 


Xh 


XXb 


0 5 X X h 


OOlh 



Boundary-Scan Test Access Port (TAP) 
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7.8.2 Public Instructions 

The processor supports all three IEEE-mandatory instructions 
(BYPASS, SAMPLE/PRELOAD, EXTEST), three IEEE- 
optional instructions (IDCODE, HIGHZ, RUNBIST), and three 
instructions unique to the AMD-K5 processor (ALL1, ALLO, 
USEHDT). Table 7-6 shows the complete set of public TAP 
instructions supported by the processor. In addition, the pro- 
cessor implements several private manufacturing test instruc- 
tions. 

The IEEE standard describes the mandatory and optional 
instructions. The ALL1 and ALLO instructions simply force all 
outputs and bidirectionals High or Low. The USEHDT instruc- 
tion is described below. Any instruction encodings not shown 
in Table 7-6 select the BYPASS instruction. 



Table 7-6. Public TAP Instructions 



Instruction 


Encoding 


Register 


Description 


EXTEST 


00000 


BSR 


As defined by the IEEE standard 


SAMPLE/ 

PRELOAD 


00001 


BSR 


As defined by the IEEE standard 


IDCODE 


00010 


DIR 


As defined by the IEEE standard 


HIGHZ 


00011 


BR 


As defined by the IEEE standard 


ALL1 


00100 


BR 


Forces all outputs and bidirectionals High 


ALLO 


00101 


BR 


Forces all outputs and bidirectionals Low 


USEHDT 


00110 


HDTR 


Accesses the Hardware Debug Tool (HDT) 1 
See Section 7.9 on page 7-23 


RUNBIST 


00111 


BISTRR 


As defined by the IEEE standard 


BYPASS 


mu 


BR 


As defined by the IEEE standard 


BYPASS 


undefined 


BR 


Undefined instruction encodings select the BYPASS 
instruction 


Notes: 

/. Documentation on the Hardware Debug Tool (HOT) is available from AMD under a nondisclosure agreement. 
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7.9 Hardware Debug Tool (HDT) 



The Hardware Debug Tool (HDT) — sometimes referred to as 
the debug port or Probe mode — is a collection of signals, regis- 
ters, and processor microcode that is enabled when external 
debug logic drives R/5 Low or loads the processor’s Test Access 
Port (TAP) instruction register with the USEHDT instruction. 

Documentation on the HDT is available under nondisclosure 
agreement to test and debug developers. For information, con- 
tact your AMD sales representative or field application engi- 
neer. 



Hardware Debug Tool (HDT) 
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Appendix A 

Compatibility With the 
Pentium and 486 Processors 



The AMD-K5 processor is compatible with the existing Pen- 
tium-class hardware and software infrastructure, including 
chipsets, motherboards, operating systems, and applications 
software. In particular, the following AMD-K5 processor fea- 
tures are compatible with the Pentium processor: 

■ Package and pinout 

■ Electrical interface (including bus cycles, AC and DC 
parameters, interrupt handling, power saving, etc.) 

■ Instruction set, programming model, memory management, 
etc. 

Because the AMD-K5 processor takes a different approach to 
implementing the x86 architecture, there are a few subtle dif- 
ferences between the Pentium and AMD-K5 processors. This 
appendix describes these differences. (For the most current 
list of differences, see the Comparison of the AMD-K5 and Pen- 
tium Processors application note, order# 20025.) 



A-1 




AMpg 

AMD-K5 Processor Technical Reference Manual 



18524C/0- Nov 1996 



A.1 Bus Signals 



A. 1 . 1 Signal Comparison 

Table A-l compares the signals on the Pentium processor with 
those on the AMD-K5 processor, showing which signals are sup- 
ported on each processor. 



Table A-l. AMD-K5 and Pentium Processor Signal Comparison 



Signal 


Pentium 
(735\90, 
8 1 5\1 00) 


AMD-K5 


Function 


A20M 


X 


X 


Address Bit 20 Mask 


A3 1 -A3 


X 


X 


Address Bus 


ADS 


X 


X 


Address Strobe 


AD5C 


X 


X 


Address Strobe 


AHOLD 


X 


X 


Address Hold 


AP 


X 


X 


Address Parity 


APCHK 


X 


X 


Address Parity Check 


APICEN 


X 




APIC Enable (High during RESET) 


PICD1 


X 




PIC Data 1 


BE7-BE0 


X 


X 


Byte Enables 


Flush(4) 


X 




Dual-Processor Flush 


APICID3-APICID0 


X 




APIC ID (during reset) 


BF (BF1-BF0) 


X 


X 


Bus-to-Core Frequency Ratio 


BOFF 


X 


X 


Bus Backoff 


BP3-BP2 


X 




Breakpoint 3 to 2 


BP1-BP0/ 

PM1-PM0 


X 




Breakpoint 1 to 0 or 
Performance Monitor 1 to 0 


BRDY 


X 


X 


Burst Ready 


BRDYC 


X 




Drive-Strength Control (during RESET) 


X 


X 


Burst Ready 


BREQ 


X 


X 


Bus Request 


BUSCHK 


X 


X 


Drive-Strength Control (during RESET) 


X 


X 


Bus Check 


CACHE 


X 


X 


Cacheable Cycle 
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Table A-l. AMD-K5 and Pentium Processor Signal Comparison (continued) 



Signal 


Pentium 
(735\90, 
8 1 5\1 00) 


AMD-K5 


Function 


CLK 


X 


X 


System Clock (5 V-tolerant) 


CPUTYP 


X 




Primary or Secondary Processor 


D/C 


X 


X 


Data or Code Cycle 


D63-D0 


X 


X 


Data Bus 


D/P 


X 




Dual or Primary Processor Cycle 


DP7-DP0 


X 


X 


Data Parity 


DPEN 


X 




Dual Processor Present (during RESET) 


PCIDO 


X 




PIC Data 0 


EADS 


X 


X 


External Address Strobe 


EWBE 


X 


X 


External Write Buffer Empty 


FERR 


X 


X 


Floating-Point Error 


FLUSH 


X 


X 


Float-Test Mode (during RESET) 


X 


X 


Writeback and Invalidate Caches 


FRCMC 


X 


X 


Functional Redundancy Checking Master/Checker 


HIT 


X 


X 


Inquire Hit 


HUM 


X 


X 


Inquire Hit to Modified Line 


HLDA 


X 


X 


Hold Acknowledge 


HOLD 


X 


X 


Hold 


TERR 


X 


X 


Internal Error 


IGNNE 


X 


X 


Ignore Numeric Error 


INIT 


X 


X 


Execute BIST (during RESET) 


X 


X 


Initialize (warm start) 


INV 


X 


X 


Invalid or Shared After Inquire Cycle 


KEN 


X 


X 


Cache Enable 


LINTO/INTR 


X 




Local Interrupt 0 (APIC enabled) 


X 


X 


Maskable Interrupt 


LINT1/NMI 


X 




Local Interrupt 1 (APIC enabled) 


X 


X 


Non-Maskable Interrupt 


LOCK 


X 


X 


Locked Cycle 


M/ID 


X 


X 


Memory or I/O Cycle 


NA 


X 


X 


Next (pipelined) Address 


PBGNT 


X 




Private Bus Grant 



Bus Signals 
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Table A-l. AMD-K5 and Pentium Processor Signal Comparison (continued) 



Signal 


Pentium 
(735\90, 
815\1 00) 


AMD-K5 


Function 


PBREQ 


X 




Private Bus Request 


PCD 


X 


X 


Page Cache Disable 


PCFfK 


X 


X 


Parity Check 


PEN 


X 


X 


Parity Enable 


PHIT 


X 




Private Hit 


PHITM 


X 




Private Hit to Modified Line 


PICCLK 


X 




PIC clock, 5 V-Tolerant 


PRDY 


X 


X 


Probe Ready 


PWT 


X 


X 


Page Writethrough 


R/5 


X 


X 


Run or Stop 


RESET 


X 


X 


Reset 


SCYC 


X 


X 


Misaligned Transfer 


SMI 


X 


X 


System Management Interrupt 


SMIACT 


X 


X 


System Management Interrupt Active 


STPCLK 


X 


X 


Stop Clock 


TCK 


X 


X 


Test Access Port (TAP) Clock 


TDI 


X 


X 


Test Access Port (TAP) Data In 


TDO 


X 


X 


Test Access Port (TAP) Data Out 


TMS 


X 


X 


Test Access Port (TAP) Test Mode Select 


TRST 


X 


X 


Test Access Port (TAP) Reset 


W/R 


X 


X 


Write or Read Cycle 


m/m 


X 


X 


Writeback or Writethrough 
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A.2 Bus Interface 



A.2.1 Updates to Descriptor Accessed and TSS Busy Bits 

For updates to the Accessed bit in the data and code segment 
descriptors, the behavior of the AMD-K5 processor is different 
than the Pentium processor. In the aligned case, the AMD-K5 
processor performs two 4-byte unlocked reads to read in the 
descriptor. If the Accessed bit needs to be set, a 4-byte locked 
read and a 4-byte locked write will follow. The Pentium proces- 
sor performs an 8-byte unlocked read to get the descriptor. If 
the Accessed bit needs to be set, an 8-byte locked read and a 1- 
byte locked write will follow. 

For the misaligned case, the AMD-K5 processor performs four 
unlocked reads to get the descriptor. If the Accessed bit needs 
to be set, two locked reads and two locked writes will follow. 
The Pentium processor performs two unlocked reads to get the 
descriptor. If the Accessed bit needs to be set, two locked 
reads will be followed by one 1-byte locked write. 

For updates to the Busy bit in the TSS descriptor, the AMD-K5 
processor behaves in the manner described for updates to the 
Accessed bit. The Pentium processor does not perform the 
unlocked read to get the descriptor. 

A.2.2 Locked and Unlocked CMPXCHG8B Operation 

On a locked and misaligned — not on a dword boundary — 
CMPXCHG8B operation, the AMD-K5 processor performs two 
split reads followed by two split writes, all under lock, for a 
total of eight cycles. The Pentium processor combines the split 
reads and split writes, for a total of four cycles. 

On a locked and aligned CMPXCHG8B operation, the AMD-K5 
processor performs two 32-bit locked reads followed by two 32- 
bit locked writes, all with SCYC asserted. The Pentium proces- 
sor combines these 32-bit reads and writes into one 64-bit read 
and one 64-bit write for the quadword-aligned case. The Pen- 
tium processor performs the same operations at the AMD-K5 
processor for the dword-aligned but quadword-misaligned 
case. 
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A.2.3 



A. 2.4 



A.2.5 



On an unlocked and non-cacheable CMPXCHG8B operation, 
the misaligned and aligned CMPXCHG8B operations are the 
same as the locked misaligned and locked aligned 
CMPXCHG8B operations, respectively, described above. 

On an unlocked and cacheable CMPXCHG8B operation, the 
AMD-K5 and Pentium processors behave the same. 

Bus Cycle Order of Misaligned Memory and I/O Cycles 

The AMD-K5 processor performs split (misaligned) memory 
read, memory write, and I/O read cycles in the reverse order of 
the Pentium processor. Split I/O write cycles occur in the same 
order on both processors. 

Halt Cycle after FEU5H 

When halted, the AMD-K5 processor reruns a Halt special 
cycle after the Flush Acknowledge special cycle following a 
cache flush operation. The Pentium processor does not rerun a 
Halt special cycle. 

Selectable Drive Strengths on Output Driver 

The AMD-K5 processor supports selectable drive strengths on 
the following output pins: 

■ A20-A3 

■ W/R 

■ ADS 

■ HTTM 

This is the same set of output pins that have selectable drive 
strengths on the Pentium processor. However, the Pentium 
processor supports three drive strengths on these pins while 
the AMD-K5 processor supports two. 
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Comments 



The selection of the drive strengths differs between the Pen- 
tium processor and the AMD-K5 processor as follows: 



Drive Strength 


BRDYC 


BUSCHK 


Pentium 


Strength 1 (weak) 


1 


X 


Strength 2 (medium) 


0 


1 


Strength 3 (strong) 


0 


0 


AMD-K5 Model 0 Rev E 


Strength 1 (weak) 


1 


X 


Strength 1 (weak) 


0 


1 


Strength 2 (medium) 


0 


0 


AMD-K5 Model 0 Rev F and Model 1 


Strength 1 (weak) 


1 


X 


Strength 2 (medium) 


0 


1 


Strength 2 (medium) 


0 


0 



In addition to the different selction criteria, the exact drive 

characteristics of the two strengths for the AMD-K5 processor 

Model 0 Rev E differ from the Intel parts. Those differences 

are as follows: 

AMD-K5 Processor Model 0 Rev E 

■ AMD-K5 processor Drive Strength 1 is between the Pentium 
processor Drive Strength weak and medium 

■ AMD-K5 processor Drive Strength 2 is between the Pentium 
processor Drive Strength medium and strong 

■ The only way to get the AMD-K5 processor Drive Strength 2 
is to select the Pentium processor Drive Strength strong (as 
shown in the table above) 

AMD-K5 Processor Model 0 Rev F and Model 1 

■ AMD-K5 processor Drive Strength 1 equals the Pentium 
processor Drive Strength weak 

■ AMD-K5 processor Drive Strength 2 equals the Pentium 
processor Drive Strength medium 

■ The only way to get the AMD-K5 processor Drive Strength 1 
is to select the Pentium processor Drive Strength weak (as 
shown in the table above) 
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A.3 Bus Mastering Operations (including Snooping) 



A.3.1 AHOLD Snoop to Linefill Buffer Prior to or Coincident with the 
Establishment of the Cacheability of the Line 

An AHOLD snoop to the linefill buffer occurs during a linefill 
when the address of the snoop matches the address of the line- 
fill. If the snoop happens before or coincident with the estab- 
lishment of the cacheability of the line via the KEN pin 
sampled with the assertion of NA or BRDY (whichever comes 
first), the AMD-K5 processor treats the snoop as a hit, whereas 
the Pentium processor treats it as a miss. 

This difference applies to the AMD-K5 processor Model 0. 
Model 1 of the AMD-K5 processor does not have the difference. 

Comments In treating the snoop as a hit, the AMD-K5 processor asserts 

the HIT pin and also caches the line as either shared or invalid, 
depending on the state of the INV pin. If KEN is sampled inac- 
tive, the line is not cached, regardless of the state of the INV 
pin. 

In treating the snoop as a miss, the Pentium processor negates 
the HIT pin and caches the line based on KEN, WB/WT, and 
PWT in the same way it does for linefills with no snoop. 

The behavior of snoops to the linefill buffer after cacheability 
is determined is described in Section A. 3. 2. 

A.3. 2 BOFF Asserted before Snoop to Linefill Buffer and after the 

Cacheability of the Line is Established 

A snoop to the linefill buffer occurs during a linefill when the 
address of the snoop matches the address of the linefill. If 
BOFF is asserted after the cacheability of the line is deter- 
mined via the KEN pin being sampled active (with the asser- 
tion of NA or BRDY, whichever comes first) and a snoop to the 
linefill buffer occurs with either BOFF or AHOLD or both 
asserted, the Pentium processor treats the snoop as a hit, 
whereas the AMD-K5 processor may or may not treat it as a hit. 
For DCACHE linefills, the AMD-K5 processor treats the snoop 
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as a miss. For ICACHE linefills, the AMD-K5 processor may 
treat the snoop as a hit or a miss, because the speculative 
nature of the linefills makes their cacheability dependent on 
the code sequence and, therefore, unpredictable from an exter- 
nal system point of view. 

This difference applies to the AMD-K5 processor Model 0. 
Model 1 of the AMD-K5 processor does not have the difference. 

Comments In treating the snoop as a hit, the AMD-K5 and Pentium proces- 

sors assert the HIT pin and also cache the line as either shared 
or invalid, depending on the state of the INV pin. The cycle 
restarts after the deassertion of BUFF and AHOLD. 

In treating the snoop as a miss, the AMD-K5 processor deas- 
serts the HIT pin. The state of the line is determined based on 
KEN, WB/WT, and PWT when the cycle is restarted after the 
deassertion of BUFF and AHOLD. 

The behavior of snoops to the linefill buffer before cacheabil- 
ity is determined is described in Section A. 3.1. 

A.3.3 Snoop Before Write Hit to ICACHE Appears on Bus 

If a write to a valid ICACHE line occurs and a snoop occurs to 
the same line before the write appears on the bus, the Pentium 
processor generates a snoop hit until the write is on the bus. 
The AMD-K5 processor generates a snoop miss in the window 
between when the cache is invalidated and the write appears 
on the bus. The ICACHE line is invalidated in both processors 
by the time the write appears on the bus. 

A.3.4 Invalidations during a FLUSH/WBINVD 

During a FLUSH/WBINVD between a line copyback and the 
Flush Acknowledge cycle, a subsequent snoop to that line 
reports a snoop hit modified and generates another copyback. 
The Pentium processor invalidates lines as they are accessed 
during FLUSH. The AMD-K5 processor invalidates all lines at 
the end of a FLUSH. 

Unce FLUSH/WBINVD has completed, the entire cache is 
invalid for both the AMD-K5 and Pentium processors. 
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A.3.5 



A.3.6 



Cache Line Ownership 

When the processor generates a read hit to a line in its own 
ICACHE, the Pentium processor invalidates the ICACHE line 
and initiates a DCACHE linefill. However, the AMD-K5 proces- 
sor keeps the ICACHE line valid and a non-cacheable, external 
read is performed to supply the data. 

Write Hit to a Shared Line in the DCACHE 

When a write hits a shared line in the DCACHE, the write is 
passed through to the external bus. The state of the WB/WT 
pin is sampled with the BKDY (or NA) of the write, and if it is 
High, the line changes state from shared to exclusive. Subse- 
quent writes to the same line change the state of the line from 
exclusive to modified and do not go external. Both the AMD-K5 
and Pentium processors behave in this manner. 

However, if two or more writes to different locations within the 
same cache line are queued up in the store buffer, the line is 
shared and the WB/WT pin is set High, then the AMD-K5 pro- 
cessor correctly allows the first write to reach the bus and the 
line transitions to exclusive. The remainder of the writes to 
that line do not show up on the external bus. In the Pentium 
processor, the first two or more writes go external. The remain- 
der hit the line in the exclusive state and do not go external. 



A-IO 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 



A.4 Memory Management 



A.4.1 Speculative TLB Refills 

The Pentium processor performs speculative TLB refills 
(including setting the accessed bit) for code prefetches. This 
may result in the accessed bit being set for a page that is not 
actually used. The AMD-K5 processor does not perform specu- 
lative TLB refills. 

A.4.2 Page Fault Encountered by a Load/Store Type of Instruction 

On a read page fault encountered by a load/op/store type of 
instruction, the error code reported by a 486 processor indi- 
cates a read operation, whereas the Pentium processor 
indicates a write operation. The AMD-K5 processor reports the 
same error code as the 486 processor. 



Memory Management 
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A.5 Power Saving Features 



A.5.1 STPCLK in Halt State 

When in the Halt state, the AMD-K5 processor responds to 
STPCLK and enters the Stop Grant state. The Pentium proces- 
sor ignores STPCLK in the Halt state. 

A.5.2 STPCLK Pulse does not Guarantee That One Instruction 
Executes 

Unlike the Pentium processor, the AMD-K5 processor does not 
guarantee that at least one instruction will be executed 
between the negation of STPCLK and a subsequent reassertion 
of STPCLK. On the Pentium processor, at least one instruction 
is guaranteed to execute. 

A.5.3 Simultaneous I/O SMI Trap and Debug Breakpoint Trap 

On a simultaneous I/O SMI trap and debug breakpoint trap, the 
AMD-K5 processor responds to the SMI first and postpones 
writing the fault frame for the debug trap to the stack until 
after the resumption of normal execution via RSM. (If debug 
registers DR3-DR0 are going to be used while in SMM, they 
must be saved and restored by the SMM software. DR6 and 
DR7 are automatically saved and restored.) This is similar to 
the Pentium processor behavior (P54C only) with TR12.ITR set 
to 1, although the postponing of the debug trap is only accom- 
plished with trapped I/O instructions, where the timing of the 
SMI met the requirements for SMI I/O trapping. 

On the AMD-K5 processor, if, on the RSM, the I/O Restart Flag 
in the SMM save area is set, the debug trap is cancelled and 
will be redetected as a result of the reexecution of the I/O 
instruction. 
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A.5.4 SMM Save Area 

The contents of any reserved locations are not necessarily the 
same between the AMD-K5 and Pentium processors. In addi- 
tion, the AMD-K5 and the Pentium processors store the IDT 
base at different locations in the SMM save area. The AMD-K5 
processor stores the IDT base at offset 7F94h, and the Pentium 
processor stores it at offset 7F90h. These locations were previ- 
ously reserved but are now documented in current Pentium 
processor documentation. 

A.5.5 NMI Recognition during SMIVI 

When operating in SMM, an NMI request should not be recog- 
nized unless an enabled INTR is encountered. Both the 
AMD-K5 and Pentium processors do this correctly, but in 
slightly different ways. The Pentium processor takes the NMI 
request immediately after recognizing the INTR, but before 
executing any instructions from the interrupt handler. The 
AMD-K5 processor takes the NMI request upon encountering 
the IRET in the interrupt handler. (In fact, the AMD-K5 pro- 
cessor unmasks NMI when any IRET is encountered, not just 
one associated with INTR.) 

Comment With both processors, the Intel recommendation of using a 

fake INTR to unmask NMI while in SMM works correctly. 
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A.6 Exceptions 



A.6.1 Limit Faults on an Invalid Instruction 

When executing an instruction that crosses a limit boundary 
and the instruction is interpreted as invalid, the AMD-K5 pro- 
cessor prioritizes the invalid opcode fault. The Pentium and 
486 processors prioritize the limit violation fault. 

A.6.2 Task Switch 

On a task switch, the AMD-K5 processor sets the busy bit of the 
incoming task after storing the outgoing TSS according to 486 
and Pentium processor documentation. The Pentium processor 
sets the busy bit before trying to store the outgoing TSS. If a 
fault occurs while trying to store the TSS, the Pentium proces- 
sor clears the busy bit. The end result of the instruction is the 
same on both processors. 
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A.7 Debug 



A.7.1 Proprietary Branch Trace Messages 

Branch trace messages are different. The AMD-K5 processor 
uses the same BE pattern for the special bus cycles as the Pen- 
tium processor, but the format of decoding information is 
different. 

A.7.2 Multiple Debug Breakpoint Matches 

Multiple debug breakpoint matches on a single memory access 
do not set multiple DR6.B bits on the AMD-K5 processor. The 
Pentium processor may set multiple B-bits, regardless of 
whether the additional matching debug registers are enabled. 
On instructions that do multiple memory accesses, the Pentium 
processor sets the DR6.B bits for matching debug registers that 
are both enabled and disabled. The AMD-K5 processor only 
sets the DR6.B bits for debug registers that are enabled. 

A.7.3 Simultaneous Debug Trap and Debug Fault 

If a debug trap associated with the completion of an instruc- 
tion (single-step trap or load/store breakpoint) occurs at the 
same time as a debug fault (instruction breakpoint) on the next 
instruction, the Pentium processor merges the two conditions 
into a single call to the debug handler, setting both B bits in 
the debug status register. The AMD-K5 processor processes the 
two conditions serially, setting the appropriate B bits for each 
invocation of the handler. 
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6-1 

2-13, 6-4-6-5 

6-4 

5-95 

2-26 

6-2 

2-26 



operands 4-2 

ordering 2-26 

paging 2-28 

read/write reordering 2-27 

segmentation 2-27 

SMM 6-5 

stack 4-2 

storage model 2-26 



TLBs 2-28 

MESI State 2-16,2-18 

inquire cycles 5-71 

reads 5-134 

writes 5-135 

Microcode 2-7 

Misalignment 

order of data transfers 5-147 

MMU 2-26 

Model-Specific Registers (MSRs) 3-25 

MOV to/from CR4 3-31 

Move and Convert 4-3 

MSRs 3-25 

Multiplies 4-3 

N 

NA 5-8, 5-96, 5-150 

Next Address 5-96 

NMI 5-10, 5-15-5-16, 5-97 

Noise Reduction 6-41 

Non-Maskable Interrupts 5-97 

Notation xv, 4-5 

Numeric Errors 5-79 
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Opcodes 

reserved 3-36 

Operands 4-2 

aligned 5-114 

alignment 5-137 

Optimization 4-1 

Output-Float Test 7-7 

Outputs at RESET 5-112 

Outputs Floated With BUFF 5-38 

Outputs Floated With HLDA 5-74 
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Page Cache Disable 5-99 

Page Size 3-8, 3-11 

Page Size Extension 3-3, 3-5 

Page Writethrough 5-105 

Page-Directory Entry (PDE) 3-8 

Pages 

4-Mbyte 3-5, 3-8 

Page-Table Entry (PTE) 3-10 

Paging 2-28 

cacheable 5-99 

global 3-8-3-9, 3-11 

page size 3-8, 3-11 

page-directory entry 3-8 

page-table entry 3-10 
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Parity 

address 

data 

enable 

PCD 

PCHK 

PDE 

FEN 

Performance 

Peripheral Products 

Pipeline 

byte queue 

decode 

dependencies. . . . 

dispatch 

dispatch conflicts 

execute 

fetch 

flush 

flush (FLUSH) . . . 

flush (INIT) 

flush (INTR) 

flush (NMI) 

flush (R/S) 

flush (RESET) . . . 

flush (SMI) 

flush (STPCLK) . . 

forwarding 

invalidation 

issue 

load 

performance 

retirement 

serialization 

store 

synchronization . . 

Power 

Power Management 

PRDY 

Precise interrupts. . 

Predecode 

Prefetch 

buffer 

Prefixes 



5-8-5-9 

5-31-5-32,5-157 

5-57, 5-101 

5-102 

5-9, 5-99 

5-9, 5-101, 5-141 

3-8 

5-9, 5-102, 5-141 

4-1 

6-43 

2-4 

2-7 

2-7 

2-8, 2-11 

2-8 

4-3 

2-8 

2-6 

5-13 

5-66 

5-82, 5-195 

5-84 

5-98 

5-107 

5-110 

5-117, 5-189 

5-123, 5-192 

2-8, 2-11-2-12, 2-16-2-17 

2-12 

2-8 

2-15 

4-1 

2-12 

2-7 

2-15, 2-24 

2-7 

6-38 

5-122, 6-33 

5-8, 5-10, 5-103 

5-13 

2-3 

2-3 

2-3, 2-22, 2-24 
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Privilege level 5-140 

Probe Mode 5-103, 5-107, 7-23 

Probe Ready 5-103 

Protected Virtual Interrupts 3-3, 3-24 

PS 3-8, 3-11 

PSE 3-3 

PTE 3-10 

Public TAP Instructions 7-22 



PVI 3-3, 3-24 

PWT 5-9, 5-105, 5-150 
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R/5 5-10, 5-16, 5UU7 

RDMSR 3-33 

RDTSC 3-32 

Reads 

I/O 5-146 

MESI state 5-134 

reordering 2-27 

single-transfer from memory 5-141 

single-transfer misaligned 5-147 

W/R 5-132 

Real Mode 

transition from protected mode 5-195 

References xviii 

Register 

file 2-12 

Registers 

AAR 7-8 

CR4 3-2, 3-31 

debug 7-16 

DR7-D0 7-16 

EFLAGS 3-15 

HWCR 7-3 

MCAR 3-4, 3-25 

MCTR 3-4, 3-26 

model-specific 3-25 

MSRs 3-25 



operands 4-2 

state after RESET or INIT 5-110 

TAP device ID 7-21 

Reorder Buffer (ROB) 2-11 

Reordering of Reads and Writes 2-27 

Replacement 

buffer 2-25 

cache 2-20 

Reserved Opcodes 3-36 

RESET 5-8, 5-10, 5-109 

Reset (soft) 5-81 

Retirement 2-12, 2-24 

ROB 2-11 

ROPs 2-7-2 -8 

RSM 3-35 
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SCYC 5-8, 5-114 

Segmentation 2-27 

Self-Modifying Code 2-21, 2-23 

Serialization 2-7 

Serializing instructions 2-8 

Shift Units 2-9 

Shifts 4-2 

Shutdown Cycle 5-182 

Shutdown State 5-8, 5-35, 5-180 



1-6 



Index 




1 8524C/0 — Novi 996 



AMpg 

AMD-K5 Processor Technical Reference Manual 
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A20M 5-8, 5-18, 6-22 

A31-A3 5-8, 5-20, 5-137 

address 5-8 

ADS 5-8, 5-24, 5-136 

ADSC 5-8, 5-27 

AHOLD 5-8, 5-28, 5-157, 5-159-5-160, 6-17 

AP 5-8, 5-31 

APCHK 5-8, 5-32, 5-157 

BE7-BEU 5-33, 5-56, 5-137 

BF . 5-10, 5-36 

BUFF 5-8, 5-37, 5-162, 5-164, 5-173, 6-15 

EKDY 5-9, 5-41, 5-137, 5-150 

EKDYC 5-9 

BREQ 5-8, 5-45 

bus arbitration 5-8 

BUSCHK 5-10, 5-16, 5-46 



byte enables 5-33 

CACHE 5-9, 5-49, 5-136 

cache control 5-9 

characteristics 5-4 

CLK 5-10, 5-36, 5-52, 5-192 

clock 5-10 

compatibility A-2 

cycle definition and control 5-8 

D/C 5-8, 5-53, 5-136 

D63-D0 5-9, 5-55 

data 5-9 

debug 5-10 

descriptions 5-17 

DP7-D0 5-9,5-56-5-57 



drive strength 5-46 
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EADS 5-9, 5-58 

EWBE 2-26, 5-8, 5-62, 5-144 

PEEK 5-9, 5-64 

floated 5-4 

floating-point error 5-9 

FLUSH 5-10, 5-16, 5-35, 5-65, 5-180, 5-183 

FRCMC 5-10, 5-68 

groups 5-3 

HIT 5-9, 5-70 

HITH 5-9, 5-72 

HLDA 5-8, 5-74, 5-166, 5-168 

HOLD 5-8, 5-76, 5-166, 5-168, 6-19 

IEEE 5-10, 5-78 

1GNNE 5-9, 5-79 

INIT 5-8, 5-10, 5-16, 5-81, 5-195 

inquire cycle 5-9 

internal resistors 5-4 

interrupt 5-10 

interrupt-acknowledgement 5-10 

INTR 5-10, 5-15-5-16, 5-84, 5-175 

INV 5-9, 5-88 

KEN 5-9, 5-89, 5-136, 5-150 

LOCK 5-8, 5-91 
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PEN 
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PWT 

R/S 

RESET 
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STPCLK 



5-8, 5-95, 5-136 

5-8, 5-96, 5-150 

. . 5-10, 5-15-5-16, 5-97 
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S-8-5-9 

5-9, 5-99 
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TDO 5-10, 5-129 

test 5-10 

TMS 5-10, 5-130 

TRST 5-10, 5-131 

W/R 5-8, 5-132, 5-136 

WB/WT 5-9, 5-133, 5-150 

Simultaneous Interrupts 5-15 



SMI 5-10, 5-16, 5-116, 5-189 

SMIACT 5-8, 5-10, 5-121, 5-189 

SMM 5-116, 5-121, 6-23 
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exceptions and interrupts in SMM 6-32 

Halt restart 6-30 

I/O restart 6-31 

I/O trap dword 6-31 

initial state 6-25 



memory map 6-5 

revision identifier 6-28 

RSM instruction 3-35 

state-save area 6-25 

timing 5-189 

transition from normal execution 5-189 



Snoop xvii 

Snoop. See also Internal Snooping xvii 

Snoops 2-21, 6-12 

See also, Inquire Cycles 5-8 

writeback buffer 2-26 
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Software Extensions 3-1 

4-Mbyte pages 3-8, 3-11 

branch tracing 7-17 

debug control 7-4 

debugging extensions (DE) 3-3 

disable branch prediction 7-4 

disable data cache 7-4 

disable instruction cache 7-4 
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global page extension (GPE) . 3-3, 3-8-3-9, 3-11 



I/O breakpoints 7-16 

interrupt redirection bitmap (IRB) 3-21 

machine check 3-3 

machine check enable (MCE) 3-4 

page size extension (PSE) 3-3, 3-5 

protected virtual interrupts (PVI). . . . 3-3, 3-24 

system call 3-4 

time stamp disable (TSD) 3-3, 3-27 

Virtual-8086 Mode extension (VME) . 3-3, 3-12 

Software Interrupts 3-21, 5-13, 5-86 

Special Bus Cycles 5-8, 5-180 

branch tracing 7-17 

branch-trace message 5-187 

cache-invalidation 5-184 

cache-writeback and invalidation 5-185 

encoding 5-35, 5-180 

FLUSH acknowledge 5-183 

interrupt acknowledge 5-85, 5-175 

shutdown 5-182 

Speculative Execution 2-10 

Spikes 5-30 

Split Cycles 5-114 

Stack 

allocation 4-2 

references 4-2 



State 

Halt 5-8, 5-35, 5-123, 5-180, 7-4 

Shutdown 5-8, 5-35, 5-180 

Stop-Clock 5-8, 5-52, 5-125 

Stop-Grant 5-8, 5-35, 5-124, 5-180, 7-4 

Stop-Grant Inquire 5-124 

States 



halt 6-34 

stop-clock 6-38 

stop-grant 6-37 

stop-grant inquire 6-37 

Stop-Clock State .... 5-8, 5-52, 5-125, 5-192, 6-38 

Stop-Grant Inquire State 5-124, 6-37 

Stop-Grant State 5-8, 5-35, 5-124, 5-180, 

5-192, 6-37, 7-4 

Storage 

EWBE 2-26 

model 2-26 

ordering 2-26 

read/write reordering 2-27 

Store 2-15, 2-24 

Store Buffer 2-8, 2-11-2-12, 2-22, 2-24 

STPCLK 5-10, 5-16, 5-35, 5-122, 5-180, 5-192, 6-33 

Strong Memory Order 2-26 

Successor index 2-6 

Synchronization 2-7 

System Call 3-4 

System Design 6-1 

System Management Mode. See SMM 3-35 
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Tags 

linear 2-16 
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recovery 2-17 

TAP 5-127-5-131, 7-19 

Task Switches 2-16 

TCK 5-10, 5-127 

TDI 5-10, 5-128 

TDO 5-10, 5-129 

Terminology xvi 

Test 7-1 

AAR 7-8 

arrays 7-7 

BIST 7-5 

boundary scan 7-19 

cache 7-7 

clock 5-127 

data input 5-128 

data output 5-129 

float 7-7 

functional redundancy 7-18 

HDT 7-23 

HWCR 7-3 

instructions 7-22 

mode select 5-130 

PRDY 5-103 

R/5 5-107 

reset 5-131 

TAP 7-19 

TAP device ID 7-21 

TLBs 7-7 

Test Access Port (TAP) 

TCK 5-127 

TDI 5-128 

TDO 5-129 

TMS 5-130 

TRST 5-131 

Test Signals 5-10 

Thermal Design 6-42 

Time Stamp Counter (TSC) 3-3, 3-27 

Time Stamp Disable 3-3, 3-27 

Time-Stamp Counter (TSC) 3-32 

TLBs 2-28 

testing 7-7 

TLB miss 5-171 

TMS 5-10, 5-130 

Triple Fault 5-35, 5-180 

Tristate Test 7-7 

TRST 5-10, 5-131 

TSC 3-3, 3-27, 3-32 

TSD 3-3,3-27 
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Undefined Flags 4-2 

USEHDT 5-103, 5-107, 7-23 
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YIF 3-13, 3-15 

YIP 3-13, 3-15 

Virtual Interrupt Flag (VIF) 3-13, 3-15 

Virtual Interrupt Pending (VIP) flag . . 3-13, 3-15 
Virtual-8086 Mode Extensions (VME) . . 3-3, 3-12 
VME 3-3, 3-12 

w 

W/R 5-8, 5-132, 5-136 

Wait States 5-41 

WB/WT 5-9, 5-133, 5-150 



WBINVD 

Weak Memory Order 
Writebacks 



buffers 

Write-Once Protocol 
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5-35, 5-180 
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xvii, 2-18-2-20, 5-105 
. . . 5-133, 5-153, 6-10 
. . 2-8, 2-22, 2-25-2-26 
6-19 



effect of EWBE 5-144 

EWBE 2-26 

I/O 5-146 

MESI state 5-135 

reordering 2-27 

single-transfer from memory 5-141 

single-transfer misaligned 5-147 

strongly ordered 2-26 

W/R 5-132 
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WRMSR 3-33 
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